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Abstract 


Most  computer  vision  systems  perform  objea  recognition  on  the  basis  of  the  features  extracted  from  a 
single  image  of  the  object  The  problem  with  this  approach  is  that  it  implicidy  assumes  that  the  available 
features  are  sufficient  to  determine  the  identity  and  pose  of  the  object  uniquely.  If  this  assumption  is  not 
met  then  the  feature  set  is  insufficient  and  amUguity  results.  Consequendy,  much  research  in  computer 
vision  has  gone  towards  finding  sets  of  features  that  ate  sufficient  for  specific  tasks,  with  the  result  that 
each  system  has  its  own  associated  set  of  features.  A  single,  general  feature  set  would  be  desirable.How- 
ever,  research  in  automatic  generation  of  object  recognition  programs  has  demonstrated  that  pre-deter- 
mined,  fixed  feature  sets  are  often  incapable  of  providing  enough  information  to  unambiguously 
determine  object  identity  and  pose.  One  iq)proach  to  overcoming  the  inadequacy  of  any  feature  set  is  to 
utilize  multiple  sensor  observations  obtain^  from  different  viewpoints,  and  combine  them  with  knowl¬ 
edge  of  the  3D  structure  of  the  object  to  perform  unambiguous  ^ject  recognition.  This  paper  presents 
initial  results  towards  performing  object  recognition  using  multiple  observations  to  resolve  ambiguities. 
Starting  from  the  premise  that  sensor  motions  should  be  planned  out  in  advance,  the  difficulties  involved 
in  planning  with  ambiguous  information  are  discussed.  A  representation  for  planning  (hat  combines 
geometric  information  with  viewpoint  uncertainty  is  presented.  A  sensor  planner  utilizing  the  represen¬ 
tation  was  implemented,  and  the  results  of  object  recognition  experiments  performed  with  the  planner 
are  discussed. 
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1  Introduction 

Most  computer  vision  systems  perf(»m  object  recognition  on  the  basis  of  the  information  contained  in  a  single  image. 
Topically,  a  set  of  features  is  extracted  from  the  image,  and  the  extracted  features  are  matched  against  model  features, 
with  the  best  match  determining  the  result  The  problem  with  this  aiq[)roach  is  that  it  implicitly  assumes  that  the  avail- 
aUe  features  are  sufficient  to  (tetomine  the  identity  and  pose  (position  and  mientation)  of  the  object  uniquely.  If  this 
assumption  is  not  met  ambiguity  results.  Ambiguity  may  take  the  fmm  of  multiple  object  identities,  multiple  poses, 
wboth. 

Much  research  in  computer  vision  has  gone  towards  finding  sets  of  features  that  are  sufficient  to  perform  specific 
tasks.  The  result  has  bera  a  number  of  systems  that  work  well  in  their  own  ^)ecific  domains,  but  are  not  extendable  to 
other  domains.  Clearly,  a  general  feature  set  that  would  be  good  for  most  object  recognition  tasks  is  desirable.  How¬ 
ever,  recent  research  in  the  automatic  goieration  of  object  recognition  programs  has  demonstrated  that  pre-deto’- 
mined,  fixed  feature  sets  are  often  incapable  of  providing  enough  information  to  unambiguously  determine  object 
identity  and  pose.  Given  any  set  of  features,  it  is  possible  to  specify  a  set  of  objects  for  which  the  feature  set  is  insuf¬ 
ficient  and  will  result  in  ambiguity. 

One  approach  to  overcoming  the  problem  of  insufficient  feature  sets  is  to  utilize  multiple  sensor  obsovations 
obtain^  from  different  viewpoints,  and  combine  this  informatimi  with  knowledge  of  the  3D  structure  of  the  objects 
in  order  to  perform  unambiguous  object  recognition.  By  making  use  of  observations  from  multiple  viewpoints,  two 
objects  that  have  several  poses  which  are  indistinguishable  can  still  be  recognized  as  long  as  there  is  a  distinguishing 
pattern  of  observations  for  each  object  As  a  simple  example,  Uie  front  ends  of  a  sedan  and  a  station  wagon  may  be 
indistinguishable,  but  the  side  views  are  not 

Fmr  complex  sets  of  objects,  more  than  a  single  additional  observation  may  be  required  to  resolve  ambiguity  -  in  fact 
it  is  possible  that  a  pair  of  objects  could  differ  only  in  the  sequence  of  observations,  rather  than  in  the  value  of  any 
given  observation.  Consequently,  multiple  observations  should  not  be  made  at  random,  but  should  be  planned  out 
based  on  knowledge  the  objects  and  on  the  basis  of  the  observations  already  made. 

The  purpose  of  the  research  described  in  this  report  is  to  explore  the  multiple  observation  strategy  for  object  recogni¬ 
tion.  To  contrast  this  with  other  object  recognition  research,  consider  a  graphical  representation  of  recognition  strate¬ 
gies,  as  in  Figure  1.  The  x-axis  represents  the  size  and  complexity  of  the  feature  set,  and  the  y-axis  represents  the 
number  of  observations  used.  Most  current  computer  vision  research  is  focused  on  the  single  line  representing  obser¬ 
vation  strategies  employing  a  single  observatiotL  There  is  an  entire  space  of  strategies  left  to  explore.  In  this  research, 
we  are  beginning  to  explore  the  region  near  the  y-axis;  that  is,  we  employ  relatively  small,  simple  feature  sets,  but 
rely  on  multiple  observations.  From  a  practical  point  of  view,  our  research  can  be  thought  of  as  exploring  means  of 
using  multiple  observations  from  different  viewpoints  to  increase  the  infcHmation  avail^le  from  cheap,  simple,  sen¬ 
sors. 


This  report  is  structured  as  follows.  The  next  section  is  an  overview  of  object  recognition  and  related  woik.  In  section 
3,  we  restrict  ourselves  to  the  case  in  which  the  sensor  and  object  each  have  a  single  degree  of  freedom,  and  examine 
the  data  structures  and  planning  techniques  necessary  to  utilize  multiple  observations.  Section  4  discusses  extensions 
of  the  restricted  techniques  to  higher  degrees  of  freedom.  In  section  S,  we  present  the  results  of  applying  multiple 
observations  to  the  i»oblem  of  object  localization  in  the  single  degree  of  freedom  case.  Both  simulated  and  real 
experiments  were  performed,  and  the  results  are  demonstrated  in  several  application  domains.  Finally,  in  the  last  sec¬ 
tion,  we  discuss  the  significance  of  our  approach,  and  our  plans  for  future  research. 
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2  Background  and  Related  Research 

There  are  two  areas  of  research  leading  up  to  die  work  repoted  here.  The  first  area  of  research  is  diat  of  planning  for 
sensor  observations.  The  second  area  is  that  of  autonutic  compilation  of  vision  programs.  Each  of  the  areas  is  dis¬ 
cussed  below. 

Planning  of  Sensor  Operations 

The  early  wok  in  {daiuiing  of  sensor  observations  were  only  concerned  with  single  observations.  That  is,  the  question 
addressed  was  that  of  determining  the  tqititnal  sensw  and  light  source  position  to  obtain  the  best  image.  As  tasks 
became  mtne  complex,  and  the  number  and  sitnilarity  of  objects  increased,  the  need  for  multiple  observations  became 
apparent 

The  work  of  Cowan  and  Kovesi  [6]  examined  the  problem  of  automatically  generating  the  collection  of  possible  cam¬ 
era  locations  given  a  set  of  sensing  requirements.  Their  apivoach  made  use  of  explicit  models  of  both  the  object  and 
the  camera,  ruid  solutions  satisfied  requirements  of  spatial  resolution,  focus,  and  visibility.  Their  approach  converts 
each  requirement  into  a  geometric  constraint  Each  requirement  defines  a  region  of  three-dimensional  space  that  satis¬ 
fies  the  requirement  and  the  intersection  of  the  regions  is  the  set  of  solutions. 

^  et  al  [2S]  devised  an  optimization  approach  to  determine  the  location  of  both  the  sensor  and  light  source  to  obtain 
the  best  image.  Their  sqiptoach  was  based  in  terms  of  edge  visibility;  they  considered  both  geometric  visibility,  which 
specifies  how  much  of  an  edge  is  visible  (not  occluded),  and  i^iotometric  visibility,  which  specifies  how  much  of  the 
edge  has  sufficient  contrast  to  be  detectable.  Assuming  a  fix^  distance  from  object  to  sensor,  the  viewing  sphere  is 
uniformly  sampled,  and  at  each  sample,  measures  of  both  geometric  and  photometric  edge  visibility  are  computed.  A 
given  vision  task  may  be  optimized  in  one  of  two  ways:  either  maximizing  the  number  of  edges  visible  in  an  image. 
OT  making  as  much  of  a  given  edge  visible  as  possible.  Optimality  criteria  for  each  condition  were  determined. 

Goldberg  and  Mason  [10]  investigated  the  problem  of  determining  optimal  sequences  of  squeeze  grasp  operations  for 
determining  object  pose.  They  rqiplied  the  Bayesian  framework  to  the  inoblem.  Assuming  a  uniform  distribution  of 
initial  poses  and  a  fiictionless  pmallel-jaw  gripper,  they  demonstrated  a  inogram  for  automatic  planning  of  sequences 
of  grasps  that  optimize  the  robots  expected  throughpuc  Particularly  notable  is  their  use  of  the  object  diameter  func¬ 
tion.  PCX'  polygonal  objects,  the  diameter  function  is  piecewise  sinusoidal,  but  during  a  squeeze,  the  object  rotates  to 
reduce  the  diameter,  terminating  in  a  local  minimum.  Thus,  a  gracing  operation  not  only  reduces  uncertainty,  but 
converts  the  diameter  function  into  a  piecewise  constant  funetkm  which  is  more  easily  analyzed.  Since  the  goal  was 
optimal  plans,  they  used  breadth-first  search  to  expand  the  state  space. 

In  the  domain  of  learning  robots.  Ming  Thn  [23]  addressed  the  problem  of  learning  sensing  strategies.  His  system. 
CSL,  was  implemented  on  a  real  robot,  and  emphasized  learning  of  minimum  cost,  robust  sensing  strategies.  Given  a 
basic  knowledge  of  moving,  sensing,  and  grasping  procedures  (cost,  preconditions,  features,  and  expected  errors)  and 
a  set  of  training  9  objects  labeled  by  their  grasping  procedures,  CSL  directed  a  robot  to  interact  with  the  objects  and 
build  up  a  set  of  procedures  for  object  recognition  and  grasping  while  minimizing  the  expected  cost 

Maver  and  Bajesy  [18]  investigated  the  problem  of  describing  a  random  arrangement  of  unknown  objects  in  a  scene, 
rather  than  idmtifying  a  knovm  object  They  used  a  laser  scanning  system  which  measures  range  to  visible  points. 
Occluded  regions  were  modeled  as  polygons.  Based  on  the  height  information  and  the  geomeuy  of  the  edges  of  the 
polygonal  approximation,  the  next  best  view  is  determined. 

Liu  and  Tkai  [17]  used  multiple  2D  camera  views  to  recognize  3D  objects.  Recognition  is  performed  by  matching  2D 
silhouette  shape  features  against  model  features  taken  from  a  set  of  fixed  camera  views.  The  system  made  use  of  two 
cameras  and  a  turntable  for  translation  and  rotation.  Their  system  first  reduces  ambiguity  by  taking  images  from 
above  the  turntable  to  normalize  the  top- view  shape,  position  the  object  centroid,  and  align  the  object  principle  axis. 
Next,  a  side  view  is  taken,  and  the  features  analyzed.  Until  recognition  is  accomplished,  the  object  is  then  rotated  by 
45^  and  a  new  image  acquired  and  analyzed. 
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Hutchinson  and  Kak  [14]  demonstrated  a  system  for  dynamically  planning  sensing  strategies,  based  on  the  current 
best  estimate  of  the  wmld.  Given  a  work  cell  with  well-^fined  sensing  capabilities,  their  approach  is  to  automatically 
propose  a  sensing  operation,  and  then  to  determine  the  maximum  ambiguity  which  might  remain  if  that  operation 
were  ai^ed.  The  system  then  selects  the  operatkm  which  minimizes  the  remaining  ambiguity.  Dempster-Sh^er  the¬ 
ory  [22]  was  used  to  combine  evidence  and  analyze  proposed  operations. 

Safianek,  et  al  |20]  addressed  the  problem  of  combining  low-level  measurements,  each  of  which  is  uncertain,  in  order 
to  verify  the  location  of  an  object  They  showed  that  the  comtamuion  could  be  achieved  via  Dempster-Shafer  theory 
using  binary  frames  of  discernment  They  showed  that  with  huge  amounts  of  data,  belief  functions  can  become  satu¬ 
rated,  leading  to  erroneous  conclusions  that  cannot  be  altered  by  additional  data. 

2.1  Automatic  Compilation  of  Vision  Programs 


Typically,  a  computer  vision  system  is  essentially  a  custom  solution  to  a  specific  problem,  and  is  therefore  expensive 
to  develop  and  install,  and  is  capable  of  recognizing  only  a  single  part  of  a  small  number  of  parts  under  very  special 
conditions.  Modifications  to  existing  systems  ate  difficult  to  make.  Moreover,  there  is  little  transfer  from  one  system 
to  another,  and  so  each  new  system  costs  as  much  to  develop  as  the  first  system  did. 

Recently,  work  in  the  area  of  automatic  recognition  program  generation  has  addressed  the  problem  of  cost-effective 
system  develqjment.  A  new  paradigm,  appearance-based  vision  [ll],  formalizes  and  automates  the  design  process. 
Appearance-based  vision  can  be  characterized  as  an  automated  process  of  analyzing  the  appearances  of  objects  under 
specified  observation  condition,  followed  by  the  automatic  generation  of  mo^l-based  object  recognition  programs 
based  on  the  preceding  analysis. 

An  appearance-based  system  is  known  as  a  ^ion  Algrmthm  Compiler,  or  VAC.  A  VAC  is  highly  modular.  Both 
objects  and  sensors  are  explicitly  modeled,  and  are  therefore  exclmgeable.  Hence,  a  VAC  can  generate  object  recog¬ 
nition  code  for  many  different  objects  using  the  same  sensor  model,  or  the  set  of  objects  can  be  fixed  and  the  sensor 
models  varied. 

A  VAC  incorporates  a  two  stage  approach  to  object  recognition.  The  first  stage  is  executed  off-line  and  consists  of 
analysis  of  predicted  object  appearances  and  the  generation  of  object  recognition  code,  the  sectmd  stage  is  executed 
on-line,  and  consists  of  qrplying  the  previously  generated  code  to  input  images.  The  first  stage  is  executed  only  once 
fw  a  given  object  recognition  task,  and  can  be  relatively  expensive.  The  second  stage  is  executed  many  times,  and 
must  be  both  fast  and  cost-effective.  The  high  cost  of  the  first  stage  is  amortized  over  a  large  number  of  executions  of 
the  second  stage. 

Goad  [9]  presented  one  of  the  first  programs  capable  of  automatically  constructing  an  object  recognition  program.  In 
Goad’s  system,  an  object  is  described  by  a  list  of  edges  and  a  set  of  visibility  conditions  for  each  edge.  Visibility  is 
determined  by  checking  visibility  at  a  representative  number  of  viewpoints  obtained  by  tessellating  the  viewing 
sphere.  Object  recognition  is  performed  by  a  process  of  iteratively  matching  object  and  image  edges  until  either  a  sat¬ 
isfactory  match  is  found,  or  the  algorithm  fails.  The  sequence  of  matchings  is  compiled  during  the  off-line  analysis 
phase.  Goad’s  system  was  not  completely  automatic,  however.  Goad  selected  edges  as  the  features  to  be  used  for  rec¬ 
ognition,  and  the  order  of  edge  matching  was  specified  by  hand. 

The  3DPO  system  of  BoUes  and  Horaud  [3]  was  built  with  the  intended  goal  of  using  off-line  analysis  to  produce  the 
fastest,  most  efficient  on-line  object  recognition  program  possible.  3DPO  utilized  the  local-feature-focus  method,  in 
which  a  prominent  focus  feature  is  initially  identified,  and  then  secondary  features  predicted  from  the  focus  feature 
are  used  to  fine-tune  the  localization  result.  The  system  was  not  fully  automatic,  as  the  focus  features  and  secondary 
features  were  chosen  by  hand. 

Ikeuchi  and  Kanade  [IS]  first  pointed  out  the  importaiKe  of  modeling  sensors  as  well  as  objects  in  order  to  predict 
an)earances,  and  noted  that  the  features  that  are  useful  for  recognition  depend  on  the  sensor  being  used.  Their  system 
pr^icts  object  appearances  at  a  representative  set  of  viewpoints  obtained  by  tessellating  the  viewing  sphere.  The 
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appearances  are  grouped  into  equivalence  classes  with  tespea  to  the  visible  features;  the  equivalence  classes  are 
called  aspects.  A  recognition  strategy  is  generated  from  the  aq>ects  and  their  predicted  feature  values,  and  is  repre¬ 
sented  as  an  interpretation  tree.  Each  interpretation  tree  specifies  the  sequence  of  (^rations  required  to  precisely 
localize  an  object.  The  sequence  of  operations  is  brdcen  up  into  two  parts:  the  first  part  classifies  an  input  image  into 
an  instance  of  one  of  the  aspects,  while  the  second  part  determines  the  precise  pose  (position  and  orientation)  of  the 
object  within  the  specified  aspects  is  broken  up  into  two  parts:  the  first  part  classifies  an  input  image  into  an  instance 
of  one  of  the  aspects,  while  the  second  part  determines  the  {vecise  pose  (position  and  (mentation)  of  the  object  within 
the  specified  aspect 

Hansen  and  Henderson  [12]  demonstrated  a  system  that  analyzed  3D  geometric  prc^rerties  of  objects  and  generated  a 
recognition  strategy.  The  system  was  develop^  to  make  use  of  a  range  sensor  fcr  recognition.  The  system  examines 
object  ^tpearances  at  a  representative  set  of  viewpoints  obtained  by  tessellating  the  viewing  sphere.  Geometric  fea¬ 
tures  at  each  viewpoint  are  examined,  and  the  properties  of  robusmess,  completeness,  consistency,  cost,  and  unique¬ 
ness  are  evaluated  in  order  to  select  a  complete  and  consistent  set  of  feauires.  For  each  mcxlel,  a  strategy  tree  is 
constructed,  which  describes  the  search  strategy  used  to  recognize  and  localize  objects  in  a  scene. 

The  system  of  Arman  and  Aggarwal  [1]  was  designed  to  be  capable  of  selecting  the  proper  sensor  for  a  given  task. 
Starting  with  a  CAD  model  of  an  object,  the  system  builds  up  a  tree  in  which  the  toot  ncxle  represents  the  object,  and 
the  leaves  reinesent  features  (where  features  are  dependent  upon  the  sensor  selected),  and  a  path  from  the  root  to  a 
leaf  passes  through  ncxies  rei»esenting  increasing  specificity.  Each  arc  in  the  tree  is  weighted  by  a  “reward  potential” 
that  represents  the  likely  gain  firom  traversing  that  link.  At  run  time,  the  system  traverses  the  tree  from  the  root  to  the 
leaves,  chcxrsing  the  branch  with  the  highest  weight  at  each  level,  and  backuacking  when  necessary. 

The  PREMIO  system  of  Camps,  et  al  [S]  predicts  object  appearances  under  various  conditions  of  lighting,  viewpoint, 
sensor,  and  image  processing  operators.  Unlike  other  systems,  PREMIO  also  evaluates  the  utility  of  each  feature  by 
analyzing  the  detectability,  reliability,  and  accuracy.  The  predictions  ate  then  used  by  a  probabilistic  matching  algo¬ 
rithm  that  performs  the  on-line  prtxess  of  identification  arid  l(x:alization. 

The  BONSAI  system  of  Flynn  and  Jain  [7]  identifies  and  localizes  3D  objects  in  range  images  by  comparing  rela¬ 
tional  graphs  extracted  from  CAD  models  to  relational  graphs  constructed  from  range  image  segmentation.  The  sys¬ 
tem  constructs  the  relational  graphs  off-line  using  two  techniques:  first,  view-independent  features  are  calculated 
directly  from  a  CAD  model;  second,  synthetic  images  are  constructed  few  a  representative  set  of  viewpoints  obtained 
by  tessellating  the  viewing  sphere,  and  the  predicted  areas  of  patches  are  determined  and  stored  as  an  attribute  of  the 
appropriate  relational  graph  node.  During  the  on-line  recognition  phase,  an  interpretation  tree  is  constructed  which 
represents  all  possible  matchings  of  the  graph  constructed  from  a  range  image,  and  the  stored  model  graph.  Recogni¬ 
tion  is  performed  by  heuristic  search  of  the  interpretation  tree. 

Sato,  et  al  [19]  demonstrated  a  system  for  recognition  of  specular  objects.  During  an  off-line  phase,  the  system  gener¬ 
ates  synthetic  images  from  a  representative  set  of  viewpoints.  Specularities  are  extracted  from  each  image,  and  the 
images  are  grouped  into  aspects  according  to  shared  specularities,  and  each  specularity  is  evaluated  in  terms  of  its 
detectability  and  reliability.  A  deformable  template  is  also  prepared  for  each  aspect  At  execution  time,  an  input 
image  is  classified  into  a  few  possible  aspects  using  a  continuous  classification  procedure  based  on  Dempster-Shafer 
theory.  Final  verification  and  localization  is  performed  using  deformable  template  matching. 


2.2  The  Need  for  Resolution 


Hong,  et  al  [13]  extended  the  work  of  Dceuchi  and  Kanade  by  optimizing  the  object  recognition  code  generated  by 
their  VAC.  They  noted  that  in  many  instances,  objects  have  aspects  that  catuiot  be  distinguished  on  the  basis  of  avail¬ 
able  features.  They  called  these  aspects  congruent  aspects.  Since  congruent  aspects  form  equivalence  classes  with 
respect  to  a  feature  set  they  are  grouped  into  larger  sets  called  congruent  classes.  The  linear  shape  change  determina¬ 
tion  process  can  sometimes  overcome  the  ambiguity  in  aspects,  but  not  always,  and  only  at  greatly  increased  compu¬ 
tational  cost 
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Congruent  aspects  appear  to  be  a  universally  encountered,  if  not  generally  recognized  phenomenon  of  object  recogni¬ 
tion  systems.  In  the  work  of  Hong  et  al,  most  of  the  objects  for  which  recognition  programs  were  generated  exhibited 
congruent  aspects.  M(»e  lecendy.  Siebert  and  Waxman  [21]  reported  an  adaptive  object  recognition  program  and 
noted  that  7S%  of  the  aspects  generated  by  their  system  were  ambiguous.  In  human-designed  systems,  congruent 
aspects  exist,  but  are  not  noted;  the  vision  system  designer  is  specifically  engaged  in  looking  for  a  feature  set  that  will 
not  be  ambiguous,  and  is  therefore  not  likely  to  notice  the  phenomenon  as  anything  more  that  a  failure  of  a  specific 
feature  set  However,  we  ^teculate  that  the  congruent  aspect  effect  is  responsible  for  the  fact  that  virtually  every 
object  recognition  system  uses  a  unique  feature  set  A  little  thought  shows  that  for  any  given  feature  set  there  exist 
objects  which  have  congruent  aspects  with  respect  to  that  feature  set  Additionally,  factors  such  as  sensor  noise  and 
occlusion  can  reduce  the  sensitivity  of  a  feature  set  and  add  to  ambiguity;  in  effect  these  factors  create  congruent 
aspects. 

Given  that  congruent  aspects  cannot  be  avoided,  how  can  they  be  handled?  The  typical  approach,  that  followed  by 
most  vision  system  designers,  is  simply  to  keep  developing  new  feature  sets.  That  is,  when  the  current  feature  set 
fails,  resulting  in  congruent  aspects,  a  new  feature  set  is  ^veloped  that  is  specialized  for  that  object  domain.  Another 
approach  that  has  received  less  attention,  is  to  utilize  multiple  observations  from  different  viewpoints,  using  knowl¬ 
edge  of  the  3D  shape  and  sensor  positions  to  combine  information  from  different  observations. 
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3  Planning  Multiple  Observations 

In  the  most  general  case  of  planning  multiple  observations,  multiple  objects  would  be  considered,  each  of  which 
would  have  six  degrees  of  freedom  in  pose,  and  the  soisor  involved  would  also  have  six  degrees  of  freedom.  The 
resulting  planning  space  would  then  be  12  dimensional.  Gaining  any  intuition  in  a  12  dimensional  space  would  be 
difficult,  and  the  details  of  representation  and  bookkeeping  would  likely  obscure  any  generalizations.  Therefore,  to 
gain  insight  into  the  im)Uem,  we  restrict  our  attentian  initially  to  the  simplest  case,  that  in  which  the  sensor  and  the 
object  each  have  one  degree  of  freedom.  In  the  next  section,  we  build  (»i  that  base  and  extend  the  plarming  methodol¬ 
ogy  to  the  case  of  three  degrees  of  freedom  in  object  and  sensor.  Additional  extoisions  to  still  more  general  cases  are 
possible. 

To  start  off.  we  consider  the  case  in  which  the  distance  to  the  object  remains  fixed,  the  object  has  one  axis  with  known 
orientation  about  which  it  can  rotate,  and  the  sensor  is  constrained  to  rotate  in  a  plane  about  the  object  perpendicular 
to  the  known  object  axis.  Hence,  there  is  a  single  degree  of  freedom  each  in  sensor  motion  and  object  pose;  since  sen¬ 
sor  and  object  rotate  about  the  same  axis,  the  system  as  a  whole  still  has  only  one  degree  of  freedom.  Figure  2  illus¬ 
trates  this  case. 


Figure  2:  Object  Recognition  with  One  Degree  of  Freedom 


The  one  degree  of  freedom  case  described  above  is  simple,  but  still  similar  to  a  variety  of  application  domains.  One 
example  scenaric 's  that  in  which  a  mobile  robot  moves  around  an  object  in  order  to  recognize  it  -  the  robot  motion  is 
locally  planar,  the  distance  to  the  object  remains  fixed,  and  in  many  cases,  such  as  automobiles  and  fumiuire,  objects 
have  a  known  “upright”  posture.  Another  application  scenmio  is  that  of  sensing  the  pose  of  an  object  from  measure¬ 
ments  of  object  diameter  made  by  tentatively  grasping  the  object  from  above  and  measuring  the  gap  between  the 
manipulator  fingers;  we  call  this  finger  gap  sensing,  and  demonstrate  planning  in  this  domain  in  section  5. 

As  stated  in  the  previous  section,  an  aspect  represents  a  characteristic  view  of  an  object  Inuiitiveiy,  an  aspect  corre¬ 
sponds  to  a  contiguous  set  of  viewpoints  from  which  the  object  looks  “more-or-less  the  same”.  Aspect  classification 
is  the  process  of  classifying  an  input  image  into  an  instance  of  an  aspect  Aspect  classification  is  essentially  a  process 
of  rough  localization,  since  it  limits  the  possible  object  poses  to  those  consistent  with  the  observed  aspect  For  many 
tasks,  the  rough  localization  determined  by  aspect  classification  is  sufficient  In  what  follows,  we  assume  that  this  is 
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the  case,  and  consida  the  problem  of  localization  to  be  equivalent  to  that  of  determining  the  aspect  (xiented  in  a  par¬ 
ticular  direction.  Fot  example,  to  grasp  an  object  stably  with  a  parallel  jaw  gripper,  it  is  only  necessary  to  align  a  par¬ 
ticular  aspect  of  an  object  with  a  refoence  point  of  the  griper;  given  such  an  alignment,  the  object  is  guaranteed  to 
slide  into  position  when  cmitacted  by  the  gripper  [4]. 

In  a  computer  vision  system,  aspects  can  be  defined  in  a  variety  of  ways.  For  example,  aspects  can  be  based  on  the  set 
of  visible  object  surfaces,  or  on  the  range  of  a  specific  feature.  Aspects  can  be  characterized  by  determining  the  distri¬ 
bution  of  feature  values  over  all  the  viewpoints  within  each  aspect  Aspects  with  feature  distributions  that  cannot  be 
distinguished  are  congruent  We  refer  to  a  set  of  congruent  aspects  as  an  aspect  class. 

Figure  3  illustrates  a  simple  example.  Suppose  that  the  shape  at  the  top  is  the  projection  of  a  solid  3D  object  which  is 
to  be  viewed  from  within  the  plane  of  the  page,  and  assume  orthographic  projection.  Aspects  of  the  object  have  been 
defined  based  on  the  set  of  visible  surfaces.  Numbering  surfaces  counter-clockwise  from  the  right  side,  aspect  0  is 
defined  by  the  set  of  viewpoints  for  which  surfaces  0  and  1  are  visible.  For  aspect  1,  surfaces  0, 1,  and  2  are  visible. 
For  aspect  2,  surfaces  1  and  2  are  visible.  The  rest  of  the  aspects  are  defined  by  noting  where  surfaces  appear  and  dis¬ 
appear.  Now.  assiiming  that  the  area  of  each  surface  is  the  only  available  feature,  then  surfaces  1  and  4  are  indistin¬ 
guishable,  as  are  surfaces  2  and  3.  The  resulting  aspect  classes  are  labeled  in  the  figure. 

Clearly,  if  the  feature  distributions  of  two  aspects  cannot  be  distinguished,  then  additional  sensor  observations  from 
the  same  viewpoint  will  contribute  nothing  to  the  problem  of  distinguishing  between  the  aspects.  However,  the  prob¬ 
lem  changes  when  sensor  movement  is  allowed.  Unless  an  object  is  perfectly  symmetric^,  the  relative  position  of 
other  aspects  will  be  different  for  different  aspects.  In  principle,  then,  two  congruent  aspects  can  be  distinguished  by 
moving  the  sensor  to  another  location  from  which  different  observations  will  result,  depending  on  the  identity  of  the 
original  aspect  We  call  this  (Hocess  aspect  resolution,  and  refer  to  the  distinguishing  observation  as  a  resolving 
observation.FoT  nearly  symmetrical  objects,  it  may  take  more  than  one  sensor  move  and  observation  to  resolve  con¬ 
gruent  aspects;  instead,  it  may  require  a  long  sequence  of  sensor  moves  and  observations,  referred  to  as  a  resolving 
sequence. 

To  make  these  definitions  intuitive,  consider  again  the  example  illustrated  in  Figure  3.  The  object  in  question  has  four 
congruent  aspect  classes.  The  first  observation  is  of  class-1,  which  limits  the  possible  object  poses  to  those  corre¬ 
sponding  to  either  of  aspect-1  or  aspect-6,  shown  shaded  at  the  bottom  of  the  figure.  The  second  observation,  taken 
45*  from  the  first,  is  of  class-0.  The  combination  of  class-1  followed  after  45*  by  class-0  limits  the  object  pose  to  the 
set  shown  shaded  in  the  figure;  that  is,  it  is  now  known  that  the  c^ject  is  positioned  in  such  a  way  that  aspect-6  was 
initially  visible  In  this  case,  aspect  resolution  has  restricted  the  object’s  pose  to  be  one  consistent  with  an  initial  obser¬ 
vation  of  aspect-6. 

The  example  shown  in  Figure  3  is  fairly  simple,  and  it  is  easy  fiu'  a  human  observer  to  determine  one  or  more  resolv¬ 
ing  moves  for  any  of  the  aspect  classes.  In  fact,  for  many  three-dimensional  cases,  the  determination  of  resolving 
moves  is  relatively  easy  for  a  human  to  perform.  However,  the  problem  is  not  immediately  solvable  automatically. 
Several  key  issues  have  to  be  addressed  to  autontate  planning  for  aspect  resolution.  These  issues  include  the  follow¬ 
ing: 

•  object  representation 

The  object  must  be  represented  internally  in  such  a  way  as  to  make  explicit  such  object  properties  as  the 
geometric  extents  of  and  relations  between  observation  classes.  For  example,  in  Figure  3,  the  requisite 
knowledge  included  the  facts  that  both  aspect-1  and  apect-6  spanned  45*  and  belonged  to  the  same  con¬ 
gruent  aspect  class.  Furthermore,  it  was  necessary  to  know  the  extents  and  classes  of  all  the  other  aspects 
of  the  object 

•  representation  of  uncertainty 

A  given  observation  may  not  constrain  the  pose  of  the  obj«:i  by  much,  and  so  it  is  not  possible  to  move  to 
particular  positions  in  order  to  perform  observations.  For  example,  in  figure  1,  the  first  observation  only 
constrained  the  object  pose  to  within  two  sets,  spanning  a  total  of  90*.  and  any  motion  of  less  than  45* 
would  not  be  guarante^  to  provide  useful  information.  Moreover,  even  after  resolution,  the  uncertainty 
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Figure  3:  Resolution  of  Congruent  Aspects 

was  only  reduced  to  a  set  spanning  45*.  Therefore,  every  move  must  be  made  with  respect  to  the  current 
uncertainty  in  position.  The  internal  representation  used  to  plan  moves  must  explicitly  represent  uncer¬ 
tainty  in  position. 

•  selection  of  moves 

There  are  an  infinite  number  of  moves  that  can  be  made  finm  any  location.  It  is  clearly  impossih'  ^  to 
search  through  all  moves  to  select  the  best  A  system  for  planning  resolving  moves  must  have  some  con¬ 
sistent  means  of  selecting  potential  moves  from  the  infinite  number  of  possibilities. 

Each  of  the  issues  above  is  addressed  in  the  sections  below. 


3.1  Object  Representation 


Aspects  were  originally  defined  in  [16]  as  topologically  equivalent  classes  of  object  appearances.  More  intuitively,  an 
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aspect  can  be  considered  to  be  a  continuous  collection  of  viewpoints  yielding  object  ^pearances  that  all  look  the 
same.  An  example  is  illustrated  in  Figure  4,  in  which  only  equatorial  aspects  are  considered.  In  the  figure,  aspects  are 
defined  by  the  set  of  visible  faces.  Note  that  if  only  geometric,  and  not  relational,  information  is  considered,  then 
there  are  4  sets  of  congruent  aspects:  (1,8),  (2,7],  (3,6), and  (4,5). 
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Figure  4:  Equatorial  Aspects  for  Polyhedral  Solid,  hexobj 


In  our  domain,  the  important  information  about  an  object  that  must  be  made  explicit  in  the  object  model  includes:  the 
geometric  extent  of  each  aspect,  geometric  relations  between  aspects,  and  descriptions  of  aspects  in  terms  of  feature 
values.  Since  our  domain  is  only  two  dimensional,  rather  than  employ  the  full  aspect  gr^h,  as  presented  in  [16],  we 
represent  the  information  in  the  form  of  an  aspect  diagram.  The  aspect  diagram  is  like  a  pie  chart;  it  divides  the  view¬ 
ing  circle  up  into  pie-shaped  wedges,  each  wedge  representing  an  aspect.  The  extent  of  each  aspect  is  represented  by 
the  size  of  the  wedge,  and  relations  between  aspects  can  be  computed  directly  fix)m  extent  and  ordering  information. 

Each  aspect  can  be  characterized  in  terms  of  the  feature  values  associated  with  it.  That  is,  a  single  view  of  an  object 
can  be  described  in  terms  of  the  features  extracted  from  that  vww.  An  aspect  can  then  be  characterized  by  the  range  of 
features  values  extracted  from  the  constituent  views  of  the  aspect  The  range  of  values  can  either  be  computed  analyt¬ 
ically,  using  object  and  sensor  models,  or  can  be  approximated  by  sampling  appearances  at  selected  viewpoints. 

Given  the  ranges  of  feature  values  for  an  aspect  there  is  a  straightforward  procedure  to  generate  an  aspect  classifica¬ 
tion  program.  Starting  with  the  set  of  all  aspects,  each  feature  is  examined  to  see  if  the  set  can  be  partitioned  on  the 
basis  of  the  values  of  that  feature.  Each  resulting  subset  is  then  tested  against  the  remaining  features,  and  the  proce¬ 
dure  is  iterated  until  either  each  aspect  is  uniquely  determined,  or  no  more  features  can  be  applied.  The  non-singleton 
aspect  sets  remaining  at  the  end  of  the  process  are  congruent  aspect  sets,  since  they  cannot  be  distinguished  using  the 
available  features.  The  congruent  aspect  sets  define  the  aspect  classes.  Figure  S  illustrates  the  aspect  diagram  corre- 
^nding  to  the  object  and  aspects  from  Figure  4.  The  diagram  clearly  depicts  the  extent,  relative  location,  and  class 
of  each  aspect 
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Figure  5:  Aspect  Diagram  for  Equatorial  Aspects  of  Hexobj 

3.2  Representation  for  Planning 


3.2.1  The  Observation  Function  and  Observation  Graph 

Let  \|fo  ^  ^  position  on  the  viewing  circle  around  an  object  We  can  define  an  observation  function  that  relates  the 
angular  displacement  fiom  yg  to  the  observed  aspect  cla»  with  respea  to  the  object  coordinate  system: 

(1) 

Vo 

where  Y  denotes  the  set  of  relative  viewing  positions,  and  T  *  (Yi  ...Yn)  denotes  the  set  of  aspect  classes.  For  exam¬ 
ple,  for  the  aspect  diagram  of  Figure  S,  the  following  equalities  hold:  0(0’)  s  class-0 ,  and  O  ( 120* )  =  class-3 . 
Pictorially,  we  can  illustrate  the  observation  function  for  the  aspect  gr^  of  Figure  S  as  the  one-dimensional  labeled 
line  segment  shown  in  Figure  6.  The  graph  extends  infinitely  to  both  sides,  and  is  periodic  modulo  2]t. 


c-0|  c-1  I  c-2  I 


0* 


Figure  6:  Graph  of  Observation  Function 


Note  that  the  observation  function  defined  above,  and  illustrated  in  the  figure,  dqrends  on  knowing  the  object  coordi¬ 
nate  system,  and  specifying  displacements  with  respect  to  that  coordinate  system.  In  object  localization  problems, 
one  of  the  goals  is  to  determine  the  object  coordinate  system.  In  the  case  being  considered  here,  the  relationship 
between  object  and  world  coordinates  can  be  specified  by  a  single  parameter,  0,  which  is  the  angular  displacement  of 
the  object  from  its  base  position.  Alternatively,  6  can  be  regarded  as  the  relative  rotation  between  object  and  sensor. 
The  graph  of  the  observation  function  illustrated  in  Figure  6  is  for  a  value  of  6  s  0.  The  graph  of  the  function  for  6  = 
30*  would  be  the  graph  of  Figure  6  shifted  30*  to  the  left. 

Thus,  for  a  known  displacement  of  world  to  object  cotxdinates,  the  observation  function  is  one-dimensional,  and 
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reflects  the  fact  that  perfect  positional  information  can  be  used  to  accurately  predict  the  results  of  sensor  observations. 
However,  perfect  positional  information  is  rarely,  if  ever,  known.  In  particular,  the  object  pose  is  never  known  exactly 
at  the  start  of  an  object  localization  task.  Hence,  we  need  a  way  to  include  uncotainty  in  the  object  pose  into  the 
observation  function;  this  can  be  done  by  adding  an  extra  dimension  to  the  observation  function,  llie  general  obser¬ 
vation  function  for  the  one  degree-of-&eedom  recognition  problem  therefore  becomes: 

0:('i'xe-^r)  (2) 

where  ^  and  F  are  defined  as  in  (1),  and  6  denotes  the  set  of  possible  object  poses.  Hence,  the  expression  flfty.O) 
denotes  the  aspect  class  observed  by  moving  the  sensOT  y  degrees  with  respect  to  the  origin  of  the  object  coordinate 
system  rotated  9  degrees  firom  world  coordinates.  The  graph  of  the  observation  function  is  now  two-dimensional,  and 
we  refer  to  it  as  the  observation  graph.  The  observation  graph  corresponding  to  the  object  in  Figure  4  is  illustrated  in 
Figure  7  fw  0*  ^  y,  0  <  360*. 


Figure  7:  Observation  Graph  for  HexobJ  for  0*  £  y,  6  <  360* 


Any  horizontal  line  drawn  through  the  figure  would  yield  the  observation  graph  for  the  object  in  a  particular  pose.  For 
example,  horizontal  lines  drawn  through  the  top  and  bottom  of  the  figure  would  be  identical  to  the  graph  in  Figure  6, 
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which  is  the  observation  gr4>h  fiv  the  object  at  its  base  position.  The  veidcal  axis  reixesents  uncmainty  in  object 
pose.  For  examine,  any  position  on  the  v  axis  represents  a  sensor  position.  The  vertical  line  through  that  position  rep¬ 
resents  the  ordering  of  the  set  of  possible  observations  as  a  function  of  object  pose.  Thus,  if  the  object  pose  was 
known  to  within  30°,  and  the  sensor  was  at  a  known  position,  then  the  vertical  line  segment  at  the  sensor  position 
extending  30°  above  and  below  the  approximate  pose  would  rq)resent  the  set  of  possible  observations  resulting  from 
that  uncertainty  in  pose. 

Let  the  resolution  function  that  maps  sensor  position  and  object  pose  onto  (sensed)  aspect  identity  be  denoted; 

«b:('Pxe->A)  (3) 

where  A  =  (aj . a,„}  denote  the  aspects  of  an  object.  Given  a  sensor  position  and  an  object  pose,  the  resolution 

function  identifies  the  underlying  aspect  The  output  of  the  process  of  aspect  resolution  is  exactly  the  value  <t>(0°,  6), 
where  6  is  the  pose  of  the  object,  hence  the  name  given  to  the  function  <b. 

The  observation  function  can  be  restricted  in  various  ways  to  define  observation  functions  for  various  subsets  of  y 
and  9.  The  aspect-restricted  observation  function  is  defin^  by: 

£1^  (V,9)  =  (4) 

Graphically,  an  aspect-restricted  observation  function  is  represented  by  a  single  horizontal  strip  in  the  region  0  ^  9  < 
2n  that  is  periodic  modulo  2k.  The  width  of  the  strip  corresponds  to  the  width  of  the  defining  aspect.  Intuitively,  the 
aspect-restricted  observation  function  represents  the  collection  of  observations  that  could  result  if  ocj  was  the 
aspect  underlying  the  first  observation.  Thie  aspect-restricted  observation  function  represents  the  situation  that  results 
when  a  particular  aspect  has  been  identified,  but  the  exact  pose  of  the  object  is  unknown.  The  results  of  additional 
observations  can  be  predicted  to  within  an  uncertainty  defined  by  the  size  of  the  aspect 

Similarly,  the  class-restricud  observation  function  is  defined  by: 

(V.  0)  =  (V.  0)  lo  (0. «)  -  Yj  (5) 

The  class-restricted  observation  function  represents  the  collection  of  observations  that  could  result  if  the  initial 
observation  was  of  class  A  class-restrictra  observation  function  can  be  represented  gi^hically  as  a  collection  of 
horizontal  strips  in  the  region  0  <  9  <  2n  that  is  periodic  modulo  2k.  The  number  of  strips  in  each  period  is  exactly  the 
number  of  aspects  belonging  to  class  y-y  and  the  width  of  each  strip  corresponds  to  the  width  of  the  defining  aspect 
The  class-restricted  observation  function  represents  the  situation  that  results  when  a  particular  aspect  class  has  been 
observed,  but  the  particular  aspect  is  unknown.  In  this  case,  the  results  of  additional  observations  can  be  predicted  to 
within  an  uncertainty  defined  by  the  sizes  of  the  constituent  aspects. 

In  summary,  then,  the  observation  function  is  a  representation  of  the  relationship  of  object  pose  and  sensor  position  to 
sensor  observation.  The  observation  graph  is  a  pictorial  representation  of  the  observation  function,  in  which  the  hori¬ 
zontal  axis  represents  sensor  position,  and  the  vertical  axis  represents  object  pose.  A  horizontal  line  through  the 
observation  graph  represents  the  variation  in  sensor  observations  that  would  result  from  sensor  motion  relative  to  a 
particular  object  pose.  A  vertical  line  through  the  observation  graph  represents  the  variation  in  sensor  observations 
that  would  result  from  object  rotation  relative  to  a  particular  sensor  position.The  aspect-restricted  observation  func¬ 
tion  represents  the  uncertainty  in  sensor  observations  when  the  sensor  is  moved  with  respect  to  the  observation  of  a 
particular  aspect.  The  class-restricted  observation  function  represents  the  uncertainty  in  sensor  observations  when  the 
sensor  is  moved  with  respect  to  the  observation  of  a  particular  aspect  class. 

3.2.2  Aspect  Resolution  and  the  Observation  Function 

In  what  follows,  we  will  cease  to  worry  about  the  periodicity  of  the  observation  function  and  concern  ourselves  with 
the  primary  interval  for  which  0  <t|r,  9  <  2n.  It  is  convenient  to  use  modulo  arithmetic  and  consider  the  observation 
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function  to  exhibit  wrap  around. 

The  goal  of  object  localization  in  our  restricted  domain  is  to  determine  the  exact  value  of  Sq,  the  orientation  of  the 
object  with  respect  to  world  coordinates.  Mathematically,  object  localization  restricts  the  observation  function  to  the 
one-dimensional  function  f3(y,6o).  Equivalently,  object  localization  is  the  restriction  of  the  observation  graph  to  a 
single  horizontal  line  corresponding  to  6o)> 

The  goal  of  aspect  resolution  is  less  ambitious  than  that  of  object  localization.  Rather  than  determining  an  exact  value 
of  80,  aspect  resolution  determines  a  range  of  orientatimis  (61. 8)^  and  an  aspect  oq.  such  that  aspect  cq  is  guaranteed 
to  underlie  any  observation  in  the  interval;  mathematically,  V8€(8i,8|^,  3!  i  i  d>(0,8)  »  oq.  Equivalently,  object  local¬ 
ization  is  the  restriction  of  the  observation  graph  to  a  horizontal  strip  that  is  a  subset  of  the  strip  defining  the  aspect- 
restricted  observation  function. 

The  tool  which  can  be  applied  to  perform  aspect  resolution  is  a  sensor  observation,  which  yields  the  class  of  the 
underlying  aspect.  Since  an  observation  yields  only  class  information,  a  single  observation  reduces  the  observation 
function  to  some  class-restricted  observation  function.  If  no  congruent  aspects  are  present,  then  each  class  contains 
only  a  single  aspect,  so  the  class-restricted  observation  function  is  equivalent  to  an  aspect-restricted  observation  func¬ 
tion.  and  the  task  is  completed.  More  generally,  however,  in  the  presence  of  congruent  aspects  the  class-restricted 
obsovation  function  is  equivalent  to  the  union  of  the  aspect-restricted  observation  functions  corresponding  to  the 
aspects  making  up  the  class.  Graphically,  an  observation  reduces  the  observation  graph  to  one  or  more  horizontal 
strips. 

To  formalize  this  concept  somewhat,  define  the  observation-restricted  observation  function  by; 

=O(V.0)lo(v,.e)-r.  (6) 

Inojitively,  the  observation-restricted  observation  function  is  the  subset  of  the  observation  function  that  is  consistent 
with  a  particular  observation.  For  example,  if  an  observation  made  with  a  sensor  displacement  of  \)/o  yielded  an  obser¬ 
vation  of  Yj,  then  the  restriction  of  the  observation  function  consistent  with  this  observation  would  be  defined  only  for 
8  for  which  Q(\|ro.8)  =  Yj*  Figure  8  illustrates  a  restriction  of  the  observation  graph  of  Figure  7  for  the  observation 
f2(90*,  8)  =  class-3.  This  observation-restricted  fiinction  corresponds  to  a  subset  of  the  union  of  the  aspect-restricted 
functions  Gtipea-S>  ^^upe«-e»  ^wpea-8>  ^^Mpect-O* 


Each  observation  restricts  the  observation  function  to  a  subset  that  is  consistent  with  that  observation.  If  the  relation¬ 
ships  between  multiple  observations  are  known,  then  multiple  restrictions  can  be  applied  to  the  observation  function. 
Graphically,  the  operation  is  that  of  intersecting  the  strips  resulting  from  the  individual  operations.  When  the  multiple 
restrictions  result  in  a  subset  of  the  aspect-restricted  observation  fuiKtion  for  a  single  aspect,  then  aspect  resolution 
has  been  completed.  For  example,  the  observation  of  Q(90*,  8)  =  class-3  results  in  the  observation-rcsuictcd  observa¬ 
tion  graph  of  Figure  8.  A  second  observation  of  n(210’,  8)  s  class-0  would  result  in  a  single  horizontal  suip  which 
would  correspond  to  the  union  of  the  two  aspect-restricted  functions  f^,pect-5  and  f^speci-«-  observations  at 

90*  and  210*  thus  fail  to  resolve  the  aspects  and  another  (4>setvation  is  needed.  Instead  of  210*,  a  better  position  for  a 
second  observation  is  300*;  then,  whatever  class  is  observed  provides  sufficient  information  to  resolve  the  aspects:  an 
observation  of  G(30*.  8)  s  class-2  yields  a  subset  of  the  aspect-restricted  observation  function  Gaspea.6i  f2(300*.  8)  = 
class-3  yields  fl^pect-sl  fli(300*,  8)  *  class- 1  yields  flupea-o^  £2(300*.  8)  =  class-0  yields  £2;upect-8- 

A  procedure  for  using  the  observation  function  to  perform  aspect  resolution  now  suggests  itself: 

eoHSlruel  tht  obstrvatioH  fimetion  for  tht  objtct 
perform  a  smsing  optratum 

consmet  the  observalUm-nstricted  objervalioH  fimetion  carrespomiuig  to  the  observatum 
white  (the  observation-reitriction  is  not  a  subset  of  any  aspect-restriction)  do 
seleet  a  new  sensor  position  and  move  the  sensor 
perform  a  sensing  operation 

construct  a  new  observation-restriction  by  updating  the  old 

done 
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The  key  fsoblems  that  yet  lemain  are: 

•  How  can  the  observatioa  graph  be  used  for  planning  aspect  resdution? 

•  How  should  new  sensor  positions  be  selected? 

These  problems  are  addressed  in  the  next  subsection. 


3.3  Planning  with  the  Observation  Graph 

Thus  far,  we  have  shown  that  any  given  feature  set  will  exhibit  a  phenomenon  known  as  congruent  aspects.  Congru¬ 
ent  aspects  are  distinct  aspects  with  indistinguishable  feature  sets  and  hence  belong  to  the  same  aspect  class.  Aspect 
resolution  is  the  processing  of  disambiguating  congnioit  aspects  so  that  the  particular  aspect  originally  observed  is 
known  uniquely. 

Aspect  resolution  can  be  performed  relatively  simply  by  moving  the  sensor  to  different  positions  and  keeping  track  of 
the  observations.  The  pattern  and  displacements  of  the  observations  can  provide  suilicient  information  to  disambigu¬ 
ate  congruent  aspects.  The  problem  then  becomes  one  of  picking  the  positions  of  subsequent  observations  and  com¬ 
bining  the  information  properly.  The  problem  is  complKtued  by  the  fact  that  the  exact  position  of  the  sensor  with 
respect  to  the  object  is  never  kiiown  exactly.  Instead,  a  range  of  possible  positions  is  known. 

The  observation  function  is  a  representation  that  relates  uncertainty  in  object  pose  and  sensor  position  to  sensor 
observations.  In  particular,  restrictions  of  the  observation  function  can  be  constructed  to  yield  the  exact  pattern  of 
sensor  observations  with  respect  to  particular  aspects,  classes,  or  even  individual  observations.  Using  the  observation 
function,  it  is  possible  to  predict,  with  known  uncertainty,  the  range  of  possible  observations  that  might  result  from  a 
sensor  operation  executed  at  a  particular  position. 

The  observation  function  therefore  provides  all  the  information  needed  to  plan  moves  to  perform  aspect  resolution. 
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However,  the  observation  function  is  a  mathematical  construct  which  must  be  made  m(»e  concrete  before  being 
usable.  A  more  useful  form  of  the  observation  function  is  the  observation  graph,  which  is  a  two-dimensional,  graphi¬ 
cal  representation  ot  the  observation  function.  The  previous  subsection  showed  several  examples  of  observation 
graphs.  This  section  will  show  how  to  use  observation  graphs  to  plan  resolving  sequences  of  sensor  moves. 

We  assume  that  the  first  observation  defines  the  0*  sensor  position,  so  all  moves  will  be  made  with  respect  to  this 
position.  The  outcome  of  the  first  observation  is  a  restriction  of  the  observation  graph  to  be  consistent  with  the 
observed  class.  Figure  9  illustrates  the  restriction  that  is  ctxisistent  with  an  initial  observation  of  class-3. 

ei 


Key:  class-0  class-1  class-2  class-3  class-4 


Figure  9:  Observation  Graph  for  HexobJ  Restricted  to  O(0’,9) »  ciass-3 


Each  observation  made  results  in  an  additkmal  restriction  of  the  observation  graph.  What  is  needed,  then,  is  a  way  to 
select  sensor  positions,  or  moves,  in  such  a  way  as  to  result  in  a  useful  restriction.  The  problem  is  that  there  is  poten¬ 
tially  an  infinite  number  of  moves. 

An  examination  of  the  structure  of  observation  graphs  simplifies  the  move  selection  problem.  The  observation  func¬ 
tion  is  discontinuous,  with  the  discontinuities  occurring  at  the  boundaries  between  dilTcrcnt  observation  results. 
These  boundaries  are  linear,  and  span  the  extent  of  the  observadtm  graphs,  and  therefore  span  any  restrictions.  Let  the 
points  along  the  border  of  resoicted  observation  graphs  at  which  the  observation  function  is  discontinuous  be  referred 
to  as  nodal  points.  Nodal  points  can  be  ordered  with  respect  to  their  y  coordinates.  Consider  an  observation  graph  G, 
with  nodal  points  n|  <  n2  <...  <  Op,  The  linear  structure  of  the  observation  graph  guarantees  that,  for  any  pair  of  adja¬ 
cent  nodal  points  n^  an''  ,.^i,  the  set  of  observations  (n(y,8))  possible  is  the  same  for  any  y  such  that  n^  <  y  <  ni.,.] 
Graphically,  this  means  mat  any  vertical  line  through  the  observation  graph  between  adjacent  nodal  points  will  inter¬ 
sect  the  same  regions;  only  the  relative  extent  of  the  portitxi  of  the  regions  underlying  the  lines  will  vary. 

Since  the  information  between  nodal  points  is  constant,  only  nodal  points  need  be  considered  when  selecting  sensor 
moves.  Thus,  an  infinite  number  of  moves  has  been  condensed  down  into  a  finite,  although  possibly  laige,  set  of 
moves.  Each  distinct  nodal  point  represents  a  possible  sensor  move.  Each  observation  possible  for  a  given  move  spec¬ 
ifies  a  restriction  of  the  observation  graph.  Each  such  restriction  can  be  examined  to  see  if  it  results  in  aspect  resolu¬ 
tion.  Additional  moves  can  be  examined  by  examining  the  observations  possible  at  the  new  nodal  points  resulting 
'  from  each  new  restriction.  Figure  10  illusuates  the  procedure  on  hexobj  for  moves  at  180*  and  240*.  As  seen  in  the 
figure,  the  move  at  180*  fails  to  resolve  the  aspects:  either  of  the  two  possible  observations  can  result  starting  from 
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either  aspect  The  move  at  240*.  however,  does  result  in  aq)ect  rescdotiaa. 


Figure  10:  Two  Possible  Moves  FoDowing  n(0*,6)  =  class-3 


While  Figure  10  dq)icts  only  two  of  (he  possible  moves  starting  from  £2(0’,  6)  s  cIass-3,  it  illustrates  the  tree  struc¬ 
ture  of  the  collection  of  possible  move  sequences.  Wc  call  this  tree  the  observation  tree.  Starting  from  any  observa¬ 
tion  graph,  time  exists  a  set  of  moves  defined  by  the  nodal  points  of  the  graph.  Each  of  these  moves  can  result  in  a 
limited  number  of  observatimis,  each  of  which  gives  rise  to  a  new,  more  restricted,  observation  graph.  Each  new 
observation  graph  has  an  associated  set  of  nodal  points  that  define  the  moves  that  lead  to  further  restrictions,  and  so 
on,  recursively.  Since  the  objective  of  using  multiple  observations  is  to  perform  aspect  resolution,  the  branching  pro¬ 
cess  can  be  terminated  at  any  restriction  which  is  resolved.  Ifence,  leaf  nodes  represent  the  resolution  of  some  aspect 
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Some  sequences  of  moves  never  result  in  aspect  resolution,  and  the  branching  process  for  the  subtrees  representing 
these  sequences  never  terminate. 

Planning  for  aspect  resolution  consists  of  searching  through  the  observation  tree,  looking  for  subtrees  in  which  every 
aspect  is  resolved.  We  call  such  a  subtree  a  resolving  subtree",  one  test  for  a  resolving  subtree  is  to  verify  that  the  col¬ 
lection  of  restricted  observation  grai^  at  the  leaf  nodes  covers  the  observation  graph  at  the  toot 

In  practice,  despite  the  use  of  nodal  points  to  define  moves,  there  ate  many  possible  moves  at  every  node  of  the  obser¬ 
vation  tree.  As  a  result  there  are  many  possible  resolving  subtrees  -  many  mote  than  are  i»actical  to  enumerate.  Addi¬ 
tionally,  for  complicated  objects  (as  will  be  seen  in  section  S),  afreet  resolution  may  require  sevoal  observations;  five 
or  six  observations  are  not  utiexpected.  The  result  of  all  this  is  that  observation  trees  ate  far  too  large  to  search 
exhaustively,  so  heuristic  search  is  required. 

The  resolving  subtree  that  results  from  search  of  the  obsovation  tree  is  called  a  resolution  tree,  since  it  specifies  a 
sequence  of  moves  that  will  perform  resolution  for  any  aspect  The  resolution  tree  becomes  a  specification  for  an  on¬ 
line  executive  that  performs  observations  until  aqiect  resolution  is  complete.  In  many  cases,  the  resolution  tree  is 
quite  simple,  and  contains  only  a  single  move  level.  For  example.  Figure  11  illustrates  the  aspect  diagram  and  the  cor¬ 
responding  resolution  tree  for  hexobj.The  resolution  tree  shown  has  three  types  of  nodes.  Move  nodes,  represented  by 

iQsi  icm 


Figure  11:  Aspect  Diagram  and  Resolution  Tiree  for  HexobJ 

circles  and  labeled  with  “M-”,  denote  sensor  moves;  the  move  position  is  indicated  in  degrees.  Fur  example,  the  root 
node  is  a  move  node  labeled  “M-O”,  and  the  sensor  position  is  0*.  All  moves  are  relative  to  the  initial  position.  Con¬ 
gruent-set  nodes,  represented  by  squares  and  labeled  "C-”.  denote  sets  of  congruent  aspects.  The  observed  class  is 
indicated  within  the  square,  as  well  as  the  labels  of  the  congruent  aspects.  Resolved  nodes,  also  represented  by 
squares  but  labeled  “R-”,  denote  aspects  that  have  been  resolved.  The  leftmost  descendant  of  the  root  node  is  a 
resolved  node;  the  observation  resolving  the  aspect  is  class-2,  and  the  resolved  aspect  is  aspect-2.  The  two  righunost 
descendants  of  the  toot  node  ate  congruent  set  nodes.  The  rightmost  node  represents  the  case  that  occurs  from  an 
observation  of  class-0,  which  yields  the  ctxigruent  aspects  a^)ect-0  and  aspect-4.  Note  that  all  leaf  nodes  arc  resolved 
nodes. 

The  resolution  tree  provides  directions  for  resolving  congruent  aspects.  Thus,  in  the  figure,  the  first  action  is  an  obser¬ 
vation  made  from  the  initial  position,  0*.  If  the  observation  is  clas-O,  then  the  aspect  observed  is  aspect-0.  The  node 
is  a  leaf  node,  so  the  aspect  has  been  resolved  and  processing  terminates.  If  the  first  (^)servation  is  of  class-2,  then 
either  of  aspect-2  or  aspect-7  could  be  the  initial  aspect.  A  move  of  330*  should  be  made.  If  the  resulting  observation 
is  class-1 ,  then  the  initial  aspect  was  aspect-2;  if  the  resulting  observation  is  class-3,  then  the  initial  aspect  was  aspect- 
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7. 

Figure  12  illustrates  another,  somewhat  more  comidicated  example,  in  whidi  two  moves  may  be  necessary  to  resolve 
a  set  of  congruent  aqtects.  The  examples  in  the  section  on  experiments  show  still  more  complicated  cases  in  which  as 
many  as  seven  obsemtions  are  required  to  resolve  some  aspects. 


Figure  12:  Aspect  Diagram  and  Resolution  Tlree  Specifying  Two  Moves 
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4  Extension  to  Higher  Degrees  of  Freedom 

iMTSt,  we  recap  the  results  from  the  previous  section.  In  the  constrained  scenario  considered  there,  objects  with  one 
degree  of  fireedom  in  pose  were  being  localized  using  multiple  observations  fitom  a  sensor  with  one  degree  of  &eedom 
in  position.  In  fact,  since  rotation  of  the  object  is  equivalmt  to  the  inverse  rotation  of  the  sensor,  there  is  only  one 
degree  of  freedom  in  the  system.  It  was  convenient  to  parameterize  the  space  using  two  independent  variables,  how¬ 
ever.  A  two  dimensional  observation  function  was  defined  that  mapped  object  pose  and  sensor  position  onto  observed 
aspect  class: 


Q:('Pxe-*r)  (7) 

The  observation  function  was  shown  to  have  a  representation  as  a  two-dimensional  square  planar  surface.  Sensor 
observations  were  shown  to  reduce  this  square  into  strips,  and  planning  was  performed  by  examining  sequences  of 
sensor  positions  to  find  sequences  that  would  yield  strips  corresponding  to  sub^  of  the  strips  resulting  from  restrict¬ 
ing  the  observation  function  to  individual  aspects. 

We  consider  here  the  extension  of  planning  to  three  degrees  of  freedom.  Object  position  is  assumed  to  be  known,  but 
the  three  rotational  parameters  defining  object  pose  are  allowed  to  vary.  The  sensor  is  assumed  to  be  held  a  fixed  dis¬ 
tance  firom  the  object,  allowing  only  the  sensor  view  axis  and  rotation  about  that  axis  free  to  vary;  that  yields  three 
degrees  of  fireedom  in  sensor  position. 

The  extension  of  the  observation  function  and  observation  graph  to  three  degrees  of  freedom  is  similar  to  the  deriva¬ 
tion  in  the  single  degree  of  freedom  case,  the  main  difference  being  that  the  observation  graph  may  take  several  dif¬ 
ferent  foims,  all  of  which  are  difficult  to  visualize  because  of  the  nionber  of  dimensions,  and  difficult  to  work  with 
because  of  the  auresponding  sizes  of  the  data  structures. 

The  fundamental  problem  with  extending  the  observation  graph  to  three  degrees  of  freedom  is  that  rotations  around 
multiple  axes  are  not  commutative.  As  a  result,  the  higher-dimensional  observation  graph  does  not  have  a  nice,  neat, 
conceptually  simple  structure.  Moreover,  truiny  different  rqxesentations  are  ppssible.  For  example,  pose  and  position 
could  be  describe  by  Euler  angles  or  quaternions,  and  each  description  leads  to  a  different  representation  of  the 
observation  grtqrh. 

Abstractly,  there  is  little  difficulty  in  extending  pur  (Hwious  results.  The  observation  function,  Q,  becomes  a  scalar 
function  of  two  vecuv-valued  variables,  y  and  6 ,  which  define  sensor  position  and  object  pose,  respectively; 


Q:(yxe-»r)  (8) 

As  before,  aspect,  class,  and  observation  restrictions  of  the  observation  function  partition  the  observation  futKtion 
into  subfunctions.  Equivalently,  the  higher-dimensional  observation  graph  can  be  partitioned  into  disjoint  regions 
corresponding  to  either  aspect-restrictions,  class-restrictions,  or  observation-restrictions.  As  before,  sequences  of 
observations  ean  be  represented  as  the  intersection  of  the  associated  restricted  observation  graphs.  Planning  for  aspect 
resolution  is  again  a  search  through  the  tree  of  possible  sensor  moves  to  find  a  subtree  in  which  every  leaf  node  is  a 
subset  of  an  aspect  restriction. 

In  the  sections  below,  we  add  some  detail  to  this  abstract  description  by  selecting  particular  representations  and 
examining  the  impact  of  the  representations  on  the  aspect  resolution  plarming  process. 


4.1  P-spheres  and  O-spheres 


One  representation  of  orientation  that  has  been  found  useful  in  computer  vision  is  the  solid  Gaussian  sphere  [IS], 
which  we  will  specialize  fw  the  representation  of  sensor  position  and  object  pose,  and  refer  to  as  the  position  sphere 
(p-sphere)  and  orientation  sphere  (o-sphere),  respectively.  We  start  by  defining  the  p-sphere. 
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Theie  is  a  one-to-one  coneqnndence  between  points  of  the  solid  unit  q)faeie  and  sensor  positions.  First,  assume  that 
the  object  is  at  the  center  of  the  sphere.  Then,  each  point  on  the  surface  rqaesents  a  position  of  the  sensw,  with  the 
view  axis  aligned  towards  the  sphere  center.  Rotation  around  the  axis  can  then  be  represented  by  distance  from  the 
center.  The  following  paragraphs  add  the  details  necessary  to  map  sensor  positions  onto  the  ^hete. 

Define  a  sensor  cotxdinate  system  as  shown  in  Figure  13;  the  system  is  left-handed,  with  the  z-axis  representing  the 
sensor  view  axis.  The  p-sphere  is  assumed  to  have  an  imbedded  coordinate  system,  with  the  north  pole  being  the 
intersection  of  the  p-sphere  z-axis  with  the  surface  of  the  sphere. 

Select  a  home,  or  base,  position  for  the  sensor,  and  identify  this  position  with  the  north  pole  so  that  the  sensor  points 
towards  the  center  of  the  p-sphere,  and  the  x-  and  y-axes  of  the  sensor  are  aligned  with  the  x-  and  y-axes  of  the  p- 
sphere.  Points  on  the  surface  of  the  sphere  can  be  identified  with  orientations  of  the  view  axis  by  rotating  the  origin  of 
the  sensor  coordinate  system  to  that  point  on  the  surface  and  pointing  the  z-axis  towards  the  center  of  the  p-sphere. 
There  is  obvious  freedom  of  movement  around  the  view  axis,  which  is  eliminated  by  specifying  that  the  sensor  is 
rotated  into  position  directly  from  the  north  pole  along  a  line  of  longitude.  Finally,  let  distance  along  the  view  axis 
towards  the  center  of  the  sphere  represent  rotation  clockwise  around  the  view  axis.  Figure  13  illustrates  the  definition. 


Figure  13:  The  Position  Sphere  Representing  Sensor  Positions 


The  p-sphere  has  two  singularities;  the  south  pole,  and  the  center.  These  singularities  can  be  identified  in  advance  and 
avoided  during  the  planning  process. 

The  same  general  representation  can  be  applied  to  the  process  of  representing  object  pose,  in  a  form  that  we  refer  to 
as  the  orientation  sphere,  or  o-sphere.  We  assume  that  the  location  of  the  object  is  known,  and  that  a  pose  specifies  an 
orientation  relative  to  some  fix^  coordinate  system.  Then,  all  poses  of  an  object  can  be  defined  by  three  parameters. 
As  above,  these  parameters  can  be  represented  as  points  on  a  solid  sphere,  the  o-sphere. 

Define  a  coordinate  system  fixed  with  respect  to  the  object.  Pick  one  axis  of  the  object,  the  z-axis,  for  example,  as  a 
distinguished  axis  of  the  object  Start  with  the  object  coordinate  system  aligned  with  the  o-sphere  coordinate  system, 
and  identify  this  pose  with  the  north  pole  of  the  o-sphere.  To  specify  any  pose  of  the  object  first  specify  the  new  posi¬ 
tion  of  the  z-axis;  this  axis  alignment  is  identified  with  the  ray  from  the  center  of  the  sphere  having  the  same  align¬ 
ment  Next  assume  that  the  z-axis  was  moved  to  that  position  in  the  most  direct  way  -  by  movement  along  a  line  of 
longitude.  This  assumption  specifies  new  orientations  for  the  x-axis  and  y-axis.  Now,  let  distance  from  the  surface 
toward  the  o-sphere  center  represent  rotation  about  the  new  z-axis. 

We  can  represent  the  observation  function  as  the  successive  application  of  two  functions:  the  first  is  a  function-valued 
function  that  computes  the  restriction  of  the  observation  function  to  a  single  object  orientation,  and  the  second  is  a 
scalar-valued  function  that  applies  the  restricted  observation  function  to  a  specific  sensor  position  and  computes  the 
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(Aservation  value.  More  formally: 


fl(Vo.eo)  =  Q'«-e  (¥o)  (9) 

0 

The  intuition  behind  this  formulation  is  that  every  point  of  the  o-spheie  has  an  associated  p-sphere.  Each  p-sphere 
represents  the  set  of  possible  observations  as  a  function  of  sensM*  position  for  a  given  object  pose.  We  can  label  each 
point  of  each  p-sphere  with  the  class  and  aspect  that  would  be  ol^rved  by  a  sensor  at  the  specified  position  for  an 
object  in  the  associated  pose.  Thus,  to  determine  the  value  that  would  result  firom  an  observation  of  an  object  in  a 
known  pose  made  from  a  sensw  in  a  specific  position,  one  accesses  the  p-qphere  associated  with  the  given  pose,  and 
then  lo^  up  the  value  at  the  pt^t  of  the  p-sphere  representing  die  sensor  position. 

The  representation  described  above,  that  of  an  o-sphere  consisting  of  p-spheres  at  each  point,  constitutes  the  three 
degree  of  fireedom  observation  gnqih.  It  would  also  be  possible  to  construct  the  observation  graph  by  indexing  o- 
spheres  through  a  p-sphere,  but  certain  operations  become  miKh  more  complicated. 

A  continuous  implementation  of  the  three  degree  of  fireedom  observation  graph  is  impractical.  Instead,  the  o-sphere 
and  p-spheres  must  be  discretized  into  cells  representing  ranges  of  parameters.  Each  cell  of  the  o-sphere  has  an  asso¬ 
ciate  p-sphere,  ot  child  p-sphere.  Similarly,  each  p-sphere  is  associated  with  a  qiecific  cell  on  the  o-sphere,  which 
we  will  refer  to  as  the  parent  cell.  In  what  follows,  we  assume  that  each  cell  of  the  o-sphere  contains  a  flag  that  an  be 
turned  on  or  off,  and  that  each  cell  of  a  child  p-sphere  contains  die  class  and  aspect  that  would  be  observed  for  the 
specified  pose  and  position.  Initially,  all  o-sphere  cells  are  turned  on.  The  three  degree  of  freedom  restrictions  of  the 
observation  grtgih  can  now  be  formulated  by  selectively  turning  various  cells  off. 

An  aspect-restricted  observation  graph  consists  of  all  the  o-sphere  cells  (and  child  p-spheres)  for  which  the  specific 
aspect  is  observed  by  the  sensor  at  its  base  posidoa  Using  our  representation,  we  can  construct  the  aspect-restricted 
observation  graph  by  examining  the  cells  at  the  north  pole  of  each  p-sphere.  Each  o-sphere  cell  with  the  given  aspect 
at  the  north  pole  of  its  child  p-sphere  is  left  on,  while  all  others  are  tagged  off.  The  collection  of  on  cells  represents  the 
aspect-restricted  observation  graph.  The  class-restricted  observation  graphs  can  be  similarly  constructed,  by  leaving 
on  all  o-sphere  cells  with  the  correct  class  at  the  north  pole  of  the  child  p-sphere,  and  turning  all  other  o-sphere  cells 
off 

Construction  of  the  observation-restricted  observation  graphs  is  similar.  An  observation  consists  of  an  ordered  pair 
(y,  Y. ) .  Each  p-sphere  is  examined  at  the  cell  containing  the  position  y.  If  the  cell  contains  the  observation  .  then 
the  o-sphere  cell  associated  with  that  p-sphere  is  left  on\  otherwise,  the  cell  is  tagged  off.  Obviously,  this  process  can 
be  repeated  as  additional  observations  are  made. 

As  before,  when  an  observation-restricted  observation  graph  is  a  subset  of  an  aspect-restricted  graph,  then  aspect  res¬ 
olution  is  complete. 

Planning  is  now  a  much  more  complicated  task,  because  the  space  of  possible  sensor  moves  (y  €  4*)  is  three-dimen¬ 
sional.  rather  than  one-dimensional  (y  €  'P).  Moreover,  there  is  no  clear  analog  of  nodal  points  in  the  three-dimen¬ 
sional  case.  However,  it  is  possible  to  outline  a  planning  algorithm  that  searches  through  a  tree  of  possible  sensor 
moves. 

The  number  of  moves  that  must  be  checked  can  be  limited  by  consideration  of  tlie  observable  regions  of  the  p- 
spheres.  For  many  sensors,  rotation  about  the  sensor  view  axis  does  not  affect  the  observation;  for  these  sensors,  only 
tte  surfaces  of  the  p-spheres  need  to  be  considered.  Moreover,  for  any  sensor,  the  p-sphercs  arc  simply  rotations  of 
each  other,  so  only  one  p-sphere  need  be  examined  in  order  to  select  possible  moves. 

One  possible  search  strategy  is  as  follows; 

Examine  the  surface  of  a  p-sphere  and  determine  the  minimum  extent  of  the  aspects;  that  is.  find  the  min¬ 
imum  diameter  of  the  smallest  circle  that  can  circumscribe  any  aspect  This  minimum  diameter,  d^i„,  rep- 
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resents  the  size  of  the  smallest  displacement  that  will  move  the  sensor  out  of  the  smaUest  region.  This 
correqxHids  to  the  smallest  distance  between  nodal  points  in  the  one  degree  of  freedom  case,  and  is  a  rea- 
sonabte  choice  fw  a  minimum  sensor  move  size  in  the  three  degree  of  freedom  case. 

At  the  root  node,  determine  all  the  possible  classes  observable  from  the  base  sensor  position.  For  each 
class,  construct  the  class-restricted  observation  graph  and  attach  it  to  a  new  node  of  the  search  tree.  Nodes 
with  attached  observation  graphs  are  observation  nodes. 

Now,  at  each  observation  node,  it  is  necessary  to  construct  move  nodes  corresponding  to  possible  sensor 
moves.  In  principle,  one  could  examine  each  move  corresponding  to  each  cell  in  the  prototype  p-sphere. 
In  practice,  this  is  prohibitive.  An  alternative  strategy  is  to  consider  only  the  moves  corresponding  to  cells 
that  are  intego'  multiples  of  dUm  frtxn  the  base  position.  This  collection  of  cells  form  a  set  of  bands  like 
lines  of  latitude  that  cover  the  prototype  p-sphere  at  intervals  of  dmin*  ^  ^  number  of 

moves,  and  more  heuristics  for  limiting  the  number  considered  could  be  developed,  such  as  only  using 
moves  along  longitudinal  circles  at  intervals  of  d„i„.  Figure  14  illustrates  the  two  heuristics. 

For  each  possible  move,  it  is  necessary  to  examine  the  corresponding  cells  of  all  the  p-spheres  in  the 
restricted  observation  graph;  this  determines  the  set  of  possible  ^servations.  The  set  of  possible  observa¬ 
tions  is  then  used  to  construct  new  observation  nodes  with  updated  restricted  observation  graphs.  When¬ 
ever  an  observation  graph  is  a  subset  of  an  aspect-restricted  observation  graph,  search  at  that  node  can  be 
terminated,  and  the  resolved  cells  of  the  o-sphere  can  be  marked.  Search  is  terminated  when  all  o-sphere 
cells  have  been  marked. 


Figure  14:  Selection  of  Possible  Moves  a)  Ail  Moves  of  Distance  kd^in  from  Base  Position 
b)  Moves  of  Distance  kd,„|„  From  Base  Position,  Sampled  at  Intervals  of  d,„|„ 


A  by-product  of  this  representation  is  more  accurate  object  localization  than  that  obtainable  from  just  aspect  resolu- 
tion.Only  o-sphere  cells  consistent  with  the  observations  are  tagged  on,  and  these  cells  will  be  a  subset  of  all  the  cells 
consistent  with  the  initial  observation.  Thus,  this  planning  method  also  uses  the  spatial  relationship  between  multiple 
observations  to  reduce  the  uncertainty  in  object  pose  as  much  as  possible. 

The  combinatorics  of  this  search  paradigm  are  a  function  of  the  resolution  of  the  discretization  process  used  to  con¬ 
struct  the  o-sphere  and  the  p-spheres.  As  will  be  seen  in  the  section  on  experiments,  because  of  noise  associated  with 
the  discretization  process,  the  accuracy  with  which  aspect  resolution  can  be  performed  is  correlated  with  the  resolu¬ 
tion  of  the  discretization  process.  Thus,  to  accurately  resolve  aspects,  the  combinatorics  of  planning  become  prohibi¬ 
tive.  On  the  other  hand,  to  ease  the  combinatorial  burden  of  planning,  a  coarse  discretization  is  required,  which  leads 
to  errors  in  aspect  resolution.  In  the  next  sub-section,  we  explore  an  alternative  that  trades  off  some  of  the  fidelity  in 
sensor  position  control  for  a  significant  reduction  in  combinatorics. 
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4.2  The  Observation  Block 

There  are  several  ways  to  simplify  the  representation  by  ignoring  various  parameters.  In  this  subsection,  we  explore 
an  alternative  representation  that  utilizes  a  three-dimensional  parameter  space.  Consequently,  the  information  pro¬ 
duced  by  the  plaimer  is  simpler,  in  this  case  only  a  specification  for  the  angular  distance  to  move  the  sensor  at  each 
step;  the  direction  to  move  the  sensor  is  left  unspecified. 

Consider  the  viewing  sphere  around  an  object  For  each  point  on  the  surface  of  the  sphere,  one  form  of  the  observa¬ 
tion  function  can  be  represented  as  a  rectangular  planar  surface,  the  observation  sheet.  We  shall  show  how  to  con¬ 
struct  these  surfaces,  then  stack  them  together  into  a  solid,  the  observation  block.  Planar  slices  through  the 
observation  block  correspond  to  the  collection  of  possible  observations  resulting  from  an  angular  displacement  of  the 
sensor.  Search  through  these  planar  slices  can  yield  a  sequence  of  sensor  motions  that  perform  aspect  resolution. 

Consider  a  given  point  on  the  surface  of  the  viewing  sphere.  If  the  problem  were  limited  to  moving  the  sensor  along  a 
great  circle  through  the  center  of  the  cell,  then  the  situation  would  be  like  that  in  section  3,  and  the  corresponding 
observation  function  would  be  one-dimensional,  and  representable  as  a  line  segment  with  labeled  intervals,  as 
depicted  jaeviously  in  Figure  6. 

Now,  for  each  point,  define  a  coordinate  system  so  that  all  the  great  circles  through  the  point  can  be  indexed  by  a  sin¬ 
gle  parameter,  3.  which  defines  the  angular  offset  from  a  refoence  great  circle.  Then,  each  position  along  a  great  cir¬ 
cle  can  be  indexed  by  parameter  y,  which  defines  the  angular  displacement  along  the  great  circle,  with  ly  =  0  being  at 
the  point 

In  what  follows,  we  assume  that  the  viewing  sphere  has  been  discretized  into  a  collection  of  cells,  and  we  represent 
each  cell  by  the  point  at  the  center  of  the  cell 

Let  Ci  be  one  of  the  cells  on  the  viewing  sphere.  Then  there  is  an  observation  fiinction  defined  with  respect  to  the 
coordinate  system  associated  with  c^.  which  is  a  scalar-valued  function  of  two  scalar  parameters.  3  and  Y- 

n  :(Bx'P-»r)  (10) 

The  observation  graph  corresponding  to  this  function  is  a  two-dimensional  rectangular  surface  that  we  refer  to  as  the 
observation  sheet  at  c^,  and  is  composed  of  strips  corresponding  to  the  great  circles  through  Cj.  Although  this  structure 
is  similar  to  the  observation  graphs  of  section  3,  we  give  it  a  different  name  because  the  properties  are  quite  distinct 
The  observation  sheet  is  defined  with  respect  to  a  unique  point  and  unique  coordinate  system,  and  does  not  incorpo¬ 
rate  any  notion  of  uncertainty.  Instead,  observation  sheets  will  be  combined  in  a  way  that  will  handle  uncertainty. 

Each  cell  Ci  now  has  an  associated  observation  sheet  We  can  conceptually  stack  the  observation  sheets  on  top  of  each 
other  to  form  a  three-dimensional  solid,  the  observation  block.  While  the  observation  block  contains  all  the  observa¬ 
tions  possible  from  moving  the  sensor  around  the  object,  it  is  not  parameterized  in  such  a  way  that  obsc-vations  cor¬ 
responding  to  specific  sensor  viewpoints  can  be  determined.  In  fact  there  is  little  relationship  between  proximity  of 
points  in  the  observation  block  to  the  position  of  the  sensor  or  pose  of  the  object  However,  the  observation  block 
does  make  explicit  certain  relationships. 

Suppose  that  each  strip  corresponding  to  a  great  circle  can  be  tagged  on  or  off.  The  aspect-restricted  observation 
block  consists  of  all  the  sheets  in  the  observation  block  constructed  from  points  contained  within  the  given  aspect 
That  is,  each  strip  belonging  to  any  observation  sheet  associated  with  a  cell  contained  in  the  given  aspect  is  tagged  on, 
while  all  others  are  tagged  off.  Similarly,  the  class-restricted  observation  block  consists  of  the  union  of  all  the  aspect- 
restricted  blocks  associated  with  ail  the  aspects  in  the  class.  Figure  1 5  illustrates  the  concept. 

The  observation  block  can  also  be  restricted  with  respect  to  specific  sensor  observations.  The  observation  block  has 
three  axes:  3>  y.  and  c.  Planar  slices  perpendicular  to  the  c  axis  make  up  the  basis  fw  the  aspect-restrictions  and  class- 
restrictions.  Similarly,  slices  perpendicular  to  the  y  axis  form  the  basis  for  observation  rcshictions.  The  y  axis  repre- 
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Figure  15:  Restrictknis  of  the  Observation  Block 
a)  An  Aspect  Restriction  b)  A  Class  Restriction 

sents  the  angular  di^lacentent  of  the  sensor  with  respect  to  the  center  of  each  cell.  Thus,  the  planar  slice  through  \|r=0 
represents  all  the  observations  possible  for  the  first  w  base  sensor  observation.  The  planar  slice  through  \|r=Vi  repre¬ 
sents  all  the  observations  possible  after  moving  the  soisor  by  angle  Vi;  the  direction  of  the  displacement  is  not  con¬ 
sidered,  only  the  magnitude.  An  observation  restrictions  of  the  observation  block  is  illustrated  in  Figure  16 


Figure  16:  An  Observation  Block  Restricted  to  0(90*)  s  class- 1 


Observation-restricted  observation  blocks  can  be  constructed  for  an  observation  of  class  Yi  at  angular  displacement  Vj 
by  tagging  on  all  the  observation  strips  which  contain  the  value  Yi  3t  the  displacement  As  before,  successive  obser¬ 
vation  restrictions  can  be  intersected  to  further  refine  an  observation  block.  Aspect  resolution  is  complete  when  the 
refined  observation  block  is  a  subset  of  some  aspect-restriction. 

Planning  for  aspect  resolution  can  now  be  performed  by  examining  platuu  slices  through  the  y  axis  and  searching 
through  the  tree  of  possible  sensor  displacements  for  a  subtree  in  which  every  strip  belongs  to  some  resolved  subset. 
Search  using  this  representation  requires  far  less  storage  and  computation  than  search  through  the  space  of  p-spheres 
described  previously.  However,  there  is  a  price  to  pay  fear  this  cost  savings,  and  that  price  is  robustness.  Unlike  the 
previous  method,  this  method  does  not  specify  exact  sensor  positions;  only  angular  displacements  are  specified.  Con¬ 
sequently,  there  may  be  situations  in  which  a  sensor  must  move  in  a  particular  direction  in  order  to  resolve  an  aspect 
There  is  no  way  to  compute  this  direction  using  this  representation. 
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A  reasonable  trade-off  may  be  to  implement  both  methods.  Then,  in  generating  a  sensing  strategy  for  an  object,  the 
cheaper  method  employing  observation  blocks  can  be  used  until  either  a  complete  plan  is  gaierated,  or  it  is  deter¬ 
mine  that  the  object  requires  more  exact  sensor  positioning.  In  the  latter  case,  the  p-sphere  method  can  be  applied. 
Since  the  planning  is  performed  off-line,  the  inclosed  computational  cost  is  ncM  critical 
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5  Experiments 

In  the  iveceding  sections,  we  discussed  the  theofy  and  implementation  for  a  methodology  of  using  multiple  sensor 
observations  to  perform  aspect  resolution,  which  is  equivalent  to  rough  localization.  The  need  for  multiple  observa¬ 
tions  was  driven  by  the  phenomenon  of  congruent  aspects,  which  were  defined  as  aspects  that  are  distinct,  yet  indis¬ 
tinguishable  using  the  available  feature  set 

In  this  section,  we  report  the  results  of  flying  this  methodology  to  aspect  classification  in  two  different  domains: 
specular  objects  and  finger  gtq)  sensing.  Most  of  the  experiments  were  performed  on  synthetic  data.  The  purpose  of 
the  experiments  was  to  determine  the  accuracy  of  performing  aspect  clarification  using  multiple  observations.  Since 
multiple  observations  are  unnecessary  in  the  absence  of  congruent  aspects,  our  experimental  domains  were  selected 
to  maximize  the  potential  for  congruent  aspects.  The  domains  selected  were: 

•  specular  objects 

Highly  specular  objects  are  difficult  to  recognize  because  the  specularities  that  characterize  the  domain  are 
extremely  prominent,  yet  highly  variable  with  respect  to  object  pose  and  sensor  location.  Because  of  the 
effects  of  surface  complexity  and  inter-reflections,  q)ecu]arities  are  very  difficult  to  predict  analytically. 
Moreover,  a  single  image  of  a  specular  objea  yields  very  little  information  about  the  overall  object  shape, 
since  a  specularity  is  a  local  phenomenon;  many  differoit  objea  poses  can  yield  the  same  pattern  of  spec¬ 
ularities. 

•  finger  gap  sensing 

It  is  possible  to  configure  parallel  jaw  grippers  with  sensors  that  report  when  a  jaw  has  contacted  an 
object,  and  the  distance  between  jaws  (the  ^ger  gc^).  With  such  a  configuration,  the  gripper  can  be  used 
to  measure  object  diameters.  WaUach  and  Canny  [24]  showed  how  sequences  of  diameter  measurements 
can  be  used  to  determine  stable  grasps. 

In  the  experiments  described  below,  images  of  the  objects  were  generated  for  a  representative  sample  of  sensor  posi¬ 
tions,  the  images  were  analyzed,  and  aspects  were  defined  based  on  appearances.  For  example,  in  the  specular 
domain,  aspects  were  defined  by  the  numter  of  detected  specularities.  Each  aspect  was  then  char^terized  by  averag¬ 
ing  the  values  of  the  feaoires  fa  each  sensor  position  within  that  aspect  Finally,  an  aspect  classification  uee  was  gen¬ 
erated  by  examining  the  capability  to  distinguish  aqiects  on  the  basis  of  the  computed  ranges  of  features.  An  aspect 
classification  tree  specifies  both  the  aspect  classes  and  the  sequence  of  tests  needed  for  classification.  All  of  the  above 
tasks  woe  performed  autonomously  in  the  specular  domain.  In  the  finger  gap  domain,  aspects  were  selected  by  hand. 

The  construction  and  use  of  a  resolution  tree  assumes  that  aspects  can  be  correctly  classified.  Resolution  is  performed 
by  analyzing  sequences  of  aspect  classifications  with  respect  to  sensor  positions.  Therefore,  the  accuracy  of  aspect 
classification  will  affect  the  accuracy  of  aspect  resolution.  In  the  experiments  reported  below,  the  accuracy  of  aspect 
resolution  was  related  to  the  accuracy  of  aspect  classification. 


5.1  Specular  Domain 


Specular  objects  are  those  with  smooth,  mirror-like  surfaces.  Specular  objects  are  difficult  to  recognize  and  localize 
because  the  positions  of  the  specularities  are  not  fixed  like  geometric  features  of  the  object  (such  as  edges  or  vertices) 
but  change  position  as  the  light  source  and  sensor  change  position.  Moreover,  the  information  extracted  from  specu¬ 
larities  is  very  sparse  and  contains  information  about  local  shape  only. 

For  each  of  the  objects  used,  a  CAD-like  model  of  the  objea  was  constructed.  The  model  was  used  as  the  input  to  an 
appearance  simulator  [8]  which  generated  360  images  of  the  object,  conesponding  to  the  images  that  would  be 
obtained  by  rotating  a  co-located  light-source  and  sensor  completely  around  the  object,  making  observations  at  1’ 
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intervals.  Thresholding  was  applied  to  each  image  to  extract  the  specularities,  and  then  each  specularity  was  analyzed 
to  compute  the  feature  values.  Throughout  the  experiments  in  the  ^lecular  domain,  the  features  used  were;  number  of 
q)ecul^ties,  area  of  the  largest  specularity,  dominant  eigenvalue  of  the  largest  specularity,  eigenvalue  ratio  of  the 
l^est  specularity,  maximum  distance  between  specularities,  and  minimum  distance  between  specularities. 

Each  experiment  consisted  of  selecting  an  object  pose  (rotation)  at  random,  and  feeding  the  rotated  object  model  to 
the  system  as  input  The  execution  executive  used  the  resolution  tree  generated  by  the  planner,  and  the  classification 
tree  used  to  generate  aspect  classes.  The  experiment  proceeded  by  generating  an  image  of  the  object,  processing  the 
image  to  extract  the  feature  set  and  classifying  the  input  image  as  an  instance  of  an  aspect  class,  llien,  depending  on 
the  instructions  in  the  resolution  tree,  the  process  was  halted  if  a  leaf  node  was  reached  (meaning  that  resolution  was 
completed),  or  the  simulated  sensor  was  rotated  with  respect  to  the  model  as  specified  by  the  resolution  tree  and  the 
image  generation,  processing,  and  classification  steps  repeated.  If  the  feature  set  matched  no  known  aspect,  the  exper¬ 
iment  was  terminated  with  an  error  message.  If  the  process  terminated  normally  (at  a  leaf  node),  then  the  resulting 
aspect  identification  was  compared  to  the  known  input  to  determine  whether  resolution  had  been  correcdy  performed, 
and  the  experiment  was  labeled  correct  or  incorrect,  accordingly.  Thus,  three  results  were  possible  for  each  experi¬ 
ment:  correct  resolution,  incorrect  resolution,  or  error  (unresolvsfijle). 

As  mentioned  above,  the  accuracy  of  aspect  resolution  is  dependent  upon  the  accuracy  of  aspect  classification,  which 
is  in  turn  dependent  upon  the  fidelity  of  the  sampiing.To  explore  this  effect,  the  fidelity  of  the  sampling  was  varied.  In 
different  sets  of  experiments,  aspect  classification  trees  were  generated  using  sample  object  images  generated  at  inter¬ 
vals  of  I*.  2*,  4‘,  6*.  8*,  and  10*. 

5.1.1  Sphereworld 

Consider  the  object  in  Hgure  17.  The  object,  called  ell,  is  composed  of  four  spheres  connected  to  form  an  “L”  shape. 
The  figure  depicts  a  view  of  the  object  looking  down  the  z-axis,  four  representative  views  from  the  x-y  plane,  and  the 
specularities  extracted  from  those  views. 

360  images  of  ell  were  generated  using  an  tqtpearance  simulator.  Each  of  the  images  was  thresholdcd  to  extract  the 
specularities,  and  features  of  the  specularities  were  computed.  On  the  basis  of  these  features,  the  images  were 
grouped  together  into  aspects  of  four  distinguishable  classes.  In  all,  12  aspects  resulted.  The  aspect  diagram  for  the 
object  is  shown  in  Figure  18,  along  with  the  specularities  that  characterize  each  aspect.  The  resolution  tree  that  was 
generated  for  ell  is  shown  in  Figure  19. 

100  resolution  trials  were  executed.  The  initial  rotations  were  selected  at  random  from  the  interval  [0.00,  360.00), 
with  a  resolution  of  0.01*.  No  errors  were  recorded;  resolution  was  correctly  performed  on  each  trial. 

The  experiment  was  repeated  for  different  sampling  intervals.  The  relationship  between  sampling  interval  and  aspect 
classification  trees  was  immediately  evident,  since  the  number  of  aspects  and  the  number  of  classes  varied  with  the 
sampling  interval.  100  resolution  trials  were  executed  for  each  sampling  interval.  The  results  are  shown  in  Table  1. 
As  expected,  the  results  are  more  accurate  the  more  densely  sampled  the  object  model  is.  This  is  clearly  indicated  by 
the  decreasing  number  of  correct  resolutions.  Another  int^esting  point  is  the  number  of  unresolvable  cases,  which 
increased  with  increasing  sampling  interval. 

Table  1:  Aspect  Resolution  Results  for  eO  with  Different  Sampling  Intervals 


sampling  interval 

1* 

2* 

4* 

6* 

8* 

10* 

total  tests 

100 

100 

100 

100 

100 

100 

correct 

100 

99 

94 

87 

86 

■9 

incorrect 

0 

1 

4 

11 

10 

19 

unresolvable 

0 

0 

2 

2 

4 

12 

w 
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a)  top  view 


raw  image 


specuiarities 


Figure  17:  Sample  Images  and  Extracted  Specuiarities  for  L-shaped  Object 

An  analysis  of  the  OTors  proved  to  be  interesting.  The  most  common  error  was  due  to  something  we  refer  to  as  a  bor¬ 
der  effect.  The  aspect  diagram  depicts  sharp  boundaries  between  aspects.  In  reality,  the  boundaries  are  somewhat 
blurr^.  For  example,  in  Figure  18,  aspects  1-0  and  0-4  are  adjacent  Aspect-1-0  belongs  to  class-1,  which  is  charac- 
terixed  by  the  presence  of  4  q)ecularitics.  Aspect-0-4  belongs  to  class-3,  which  is  characterized  by  the  presence  of  3 
specuiarities.  But,  even  though  specuiarities  are  unstable  features,  they  do  not  instantaneously  wink  out  of  existence. 


Figure  18:  Aspect  Diagram  for  ett 

In  actuality,  a  specularity  gets  smaller  and  smaller,  pixel  by  pixel.  In  this  case,  as  the  sensor  moves  counter-clockwise 
around  ell,  one  of  the  specularities  in  aspect-1-0  begins  to  get  smaller,  and  shrinks  until  it  disappears.  The  detection  of 
that  specularity  is  depotdent  on  the  performance  of  the  feature  extraction  module,  and  is  affected  by  quantization, 
sensor  noise,  a  threshold  on  minimum  size  of  a  specularity,  and  other  factors.  Thus,  while  the  sensor  may  be  posi¬ 
tioned  to  observe  aspect-1-0  (class-1),  one  of  the  characteristic  specularities  may  not  be  detected,  resulting  in  an  erro¬ 
neous  classificatitm  of  the  scene  as  an  instance  of  class-3.  Consequently,  the  entire  process  of  resolution  will  fail.  This 
is  an  example  of  a  border  effect  Finer  sampling  means  that  each  specularity  will  be  more  finely  sampled,  and  the 
gradual  effects  of  appearing  and  disappearing  will  be  mote  exactly  modeled.  Thble  1  clearly  shows  that  incorrect  res¬ 
olutions  increase  with  increasing  sampling  interval 


Unresolvable  cases  can  result  when  sampling  is  too  coarse.  Since  specularities  appear  and  disappear  over  small 
ranges  of  sensor  motion,  specularities  can  be  missed  entirely  by  sampling  around  the  region  in  which  a  specularity 
appears.  Thus,  when  an  observation  falls  between  samples  and  detects  the  unknown  specularity,  the  result  is  an  image 
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ELL-« 


Figure  19:  Resolution  lYee  for  ett 


that  belongs  to  no  known  class.  As  expected,  the  number  of  unresolvable  cases  increases  with  increasing  sampling 
interval. 

Additionally,  border  effects  and  unresolvable  cases  can  be  encountered  at  any  given  observation.  From  Figure  19,  the 
resolution  tree  for  eU  shows  that  as  many  as  five  observations  may  be  necessary  for  resolution.  That  presents  five 
opportunities  for  either  border  effects  or  unresolvability  to  be  encountered. 

In  addition  to  the  simulation  experiments,  several  actual  trials  were  performed  in  the  laboratory.  Two  sphereworld 
objects  were  constructed  and  painted  glossy  black.  One  object  was  ell,  shown  simulated  above.  The  other  consisted  of 
four  sphoes  connected  to  form  a  "T”  shape,  and  called  tee. 

For  each  experiment,  the  object  was  placed  on  a  black  rotation  table,  in  front  of  a  black  background.  Consequently, 
each  object  was  fairly  difficult  to  segment  using  standard  image  processing  techniques.  A  pose  for  the  object  was 
selected,  and  the  rotation  table  moved  to  the  appropriate  position.  The  camera  was  fixed,  and  a  bright  point  source 
was  co-locatcd  with  the  camera. 
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The  first  set  of  trials  involved  the  object  lee.  For  this  set  of  trials,  the  room  was  darkened  so  that  the  only  light  source 
was  the  point  source  co-located  with  the  camera.  On  each  trial,  resolution  was  conectly  perfonned. 

The  second  set  of  trials  involved  the  object  ell.  For  this  set  of  trials  the  room  lights  were  left  on.  Again,  on  each  trial, 
resdution  was  correctly  performed.  One  of  the  trials  is  illustrated  in  Figure  20.  For  this  trial,  ell  was  placed  in  a  pose 
corresponding  to  a  lela^ve  rotation  of  120°.  The  figure  shows  the  path  taken  through  the  resolution  tree.  To  the  right 
of  the  tree,  both  the  input  and  processed  images  resulting  from  the  observations  ate  shown;  each  pair  of  images  is 
shown  to  the  right  of  the  resdution  tree  node  at  which  the  observation  was  made. 


While  sphereworld  is  a  fairly  simple  domain,  the  actual  and  simulated  results  demonstrate  the  potential  of  the  multi¬ 
ple  observation  ^qxoach  fw  disambiguating  object  poses.  The  next  subsection  deals  with  a  more  complicated 
domain. 


Figure  20:  Resolution  Path  for  Object  ett 


5.1.2  Aircraft  Recognition 

In  general,  the  more  complicated  the  object,  the  mote  complex  will  be  the  patterns  of  specularities  detectable  from 
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different  viewpoints.  In  this  section,  we  tqiply  our  methodology  for  aspect  resolution  to  a  fairly  complex  object  -  a 
stylized  jet  Figure  21  presents  an  overhead  view  of  the  jet  model.  For  die  recognition  experiment  equatorial  views  of 
die  aircridt  were  used:  several  of  these  views,  along  with  the  extracted  specularides,  are  also  shown  in  Figure  21. 

360  images  of  the  jet  were  generated  using  an  appearance  simulator.  Each  of  the  images  was  thresholded  to  extract 
the  qiecularides,  a^  features  of  the  qiecularities  were  oMnputed.  On  the  basis  of  these  features,  the  images  were  ini¬ 
tially  grouped  together  into  aspects.  However,  because  of  die  wide  variation  in  appearances.  48  aspects  resulted.  The 
presence  of  many  small  aspects  dramadcally  increases  the  number  of  nodal  points  in  each  observation  graph,  and 
hence  the  Ixanching  factor  of  the  search  tree.  To  reduce  the  number  of  aspects,  aspects  spanning  less  than  S’  were 
merged  into  neighboring  aqtects.  The  result  was  24  aspects  falling  into  2  a^iect  classes.  The  aspect  diagram  for  the 
jet  model,  augmoited  with  the  pattern  of  specularides  which  characterize  the  aspects,  is  shown  in  Figure  22. 

Even  with  24  aspects,  the  branching  factor  for  the  search  tree  was  huge,  resuldng  in  a  dme-consuming  search.  The 
resulting  resolution  tree  was  fairly  large,  consisting  of  over  100  nodes.  The  resolution  tree  for  the  jet  is  shown  in  Fig¬ 
ure  23. 


As  in  the  case  of  ell,  100  resolution  trials  were  executed.  The  initial  rotations  of  the  jet  were  sel&:ted  at  random  from 
the  interval  [0.00,  360.00),  with  a  resolution  of  0.01*.  No  errors  were  recorded,  although  one  incorrect  resolution 


a)  overhead  view 


Figure  21:  Sample  Images  and  Extracted  Specularities  for  Jet  Model 


35 


Figure  22:  Aspect  Diagram  for  Jet  Model 


resulted.  The  one  incorrect  resolution  was  due  to  an  incorrect  classification  at  the  tirst  observation  (which  was  an 
instance  of  a  b(»der  effect). 

Again,  as  in  the  case  of  ell,  the  experiment  was  repeated  for  different  sampling  intervals.  The  results,  for  sampling 
intervals  of  1*,  2*,  4*,  6’,  8*,  and  10*  are  shown  in  Table  1.  Again,  a  very  strong  relationship  between  sampling  inter¬ 
val  and  resolution  accuracy  is  apparent  Resolution  was  performed  correctly  more  than  97%  of  the  time  when  the 
sampling  interval  was  small  Gess  than  2*).  However,  as  the  sampling  interval  increased,  the  pcrfunnance  of  the  aspect 
resolution  process  decreased  quickly;  at  sampling  intervals  greater  than  5*,  less  than  half  of  the  resolution  trials  were 
correct;  most  of  the  other  trials  were  unresolvable. 

The  results  on  the  jet  model  demonstrate  the  practicality  of  the  multiple  observation  paradigm  in  a  fairly  complicated 
domain. 


Figure  23:  Resolution  IVee  for  Jet  Model 


5.2  Finger  Gap  Sensing 

Wallack  and  Canny  [24]  implemented  a  system  for  perfmming  object  localization  by  examining  the  distance  between 
parallel  Jaw  grippers,  ii»  finger  gap.  The  goal  of  their  work  was  to  determine  stable  grasps  of  objects.  They  noted 
that,  when  the  object  is  touched  without  being  moved,  the  finger  gap  measures  the  object  diameter,  which  in  turn 
determines  a  set  of  possible  states  of  the  object.  Multiple  measurements  at  different  grasp  angles  further  constrain  the 
set  of  possible  states,  and  measurements  can  be  made  until  the  set  of  possible  states  includes  only  stable  grasps. 

By  making  the  possible  stable  states  correspond  to  aspects,  and  diameter  measurements  to  feature  values,  the  finger 
gap  sensing  domain  falls  into  the  one  degree  of  freedom  domain  for  which  sensor  planning  has  been  demonstrated. 
That  is,  planning  for  stable  grasps  using  finger  gap  measurements  is  a  subset  of  the  more  general  scenario  of  a  single 
sensor  constrained  to  one  rotational  degree  of  freedom.  To  prove  this,  the  finger  gap  sensing  domain  was  simulated, 
aitd  planning  was  performed  using  the  same  software  as  for  the  specular  domain  above. 

The  simulation  was  performed  by  building  a  CAD  model  of  a  planar  shape,  rotating  it  in  the  plane,  and  generating  an 
image  of  the  shape  from  above.  Simple  thresholding  and  blob  analysis  was  used  to  determine  the  diameter  of  the 

Table  1:  Aspect  Resolution  Results  for  Jet  Model  with  Different  Sampling  Intervals 


sampling  interval 

1* 

2* 

4* 

6* 

8* 

total  tests 

100 

100 

100 

100 

100 

100 

correct 

99 

97 

69 

46 

26 

34 

incorrect 

1 

3 

11 

24 

12 

9 

unresolvable 

0 

0 

20 

30 

62 

57 
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shape,  which  was  measured  in  pixels.  Figure  24  shows  the  ima^  of  the  object,  called  p-star,  the  thresholded  image, 
and  the  thresholded  image  bounded  by  two  bars  representing  the  inner  edges  of  the  fingers. 


Figure  24:  Finger  Gap  Simulation  of  Object  p-star 
ajimage  of  Planar  Object  b)  Thresholded  Image  c)  Computed  Diameter 


As  in  the  specular  domain,  samples  of  the  object  were  obtained  at  P  intervals.  Figure  25  shows  the  resulting  graph  of 
the  diameters.  On  a  purely  heuristic  basis,  ^ameter  thresholds  were  established  and  used  to  segment  the  diameter 

p-Mr-damaMT'-ttineten 


Figure  25:  Sampled  Diameter  Function  for  Planar  Object  p-star 


function  into  sets  corresponding  to  aspects.  The  initial  aspects  were  merged  together  when  possible  to  ensure  a  mini¬ 
mum  aspect  extent  of  10^.  The  resulting  aspect  diagram  is  shown  in  Figure  26,  along  with  the  resolution  uee  that  was 
generat^.  Note  that  the  aspect  diagram  is  symmetricai.  The  leaf  nodes  of  the  resolution  tree,  unlike  those  in  previous 
examples,  are  denoted  with  an  “S”  prefix,  indicating  that  the  leaf  contains  more  than  one  constituent  aspect,  which 
cannot  be  further  resolved  due  to  the  symmetry  of  the  aspect  diagram 

100  resolution  trials  were  executed.  The  rotations  were  selected  at  random  from  the  interval  [0.00,  360.00),  with  a 
resolution  of  O.OP.  95  trials  were  resolved  correctly,  3  were  incorrect,  and  2  were  unresolvable.  The  experiment  was 
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Figure  26:  Aspect  Diagram  and  Resolution  IVee  for  Planar  Object  p-star 

repeated  tor  sampling  intervals  of  2“,  4”,  6".  8”,  and  1U“.  and  the  results  are  reported  in  Table  1 .  In  general,  increasing 
the  sampling  interval  led  to  an  increasing  number  of  errors  and  unresolvable  cases.  As  before,  the  incorrect  resolu¬ 
tions  were  due  to  border  effects,  while  unresolvable  cases  were  due  primarily  to  undersampling.  Unlike  the  specular 
domain,  even  the  case  in  which  diameters  were  sensed  at  1**  intervals  exhibited  some  errors.  Further  analysis  showed 
that  this  was  an  effect  of  the  simulation  used.  No  anti-aliasing  was  employed  in  the  simulation,  so  the  diameters 
changed  by  1  or  2  pixels  over  very  small  rotations. 

Table  1:  Aspect  Resolution  Results  for  p-star  with  Different  Sampling  Intervals 


sampling  interval 

1* 

2* 

4* 

6* 

8* 

total  tests 

100 

100 

100 

100 

100 

100 

ccvrect 

95 

88 

61 

78 

39 

44 

incorrect 

3 

10 

5 

17 

53 

35 

unresolvable 

2 

2 

34 

5 

8 

21 

An  interesting  fact  to  note  in  the  table  is  the  large  difference  in  performance  for  sampling  intervals  of  4°  and  8°  com¬ 
pared  to  other  sampling  intervals.  This  is  an  effect  of  the  frequency  of  sampling  not  matching  the  frequency  of  the 
diameter  function.  In  fact,  with  an  8°  sampling  interval,  the  resulting  aspect  Vagram  was  not  even  symmcuic. 
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6  Discussion  and  Conclusions 


There  are  many  situations  in  which  a  single  sensor  observation  may  be  insufficient  to  unambiguously  interpret  an 
intage.  The  reasons  for  this  may  be  as  diverse  as  sensor  and  processing  noise,  object  occlusion,  an  inadequate  set  of 
features,  or  ambiguous  objects. 

Many  ambiguous  cases  can  be  resolved  through  the  use  of  additional  sensor  observations  made  from  different  sensor 
positions.  Intuitively,  this  makes  sense,  since  observations  from  different  viewpoints  will  contain  information  about 
different  parts  of  the  scene.  However,  simple  random  selection  of  additional  senscM'  positions  is  likely  to  be  inade¬ 
quate  (V  at  least  inefficient.  Ideally,  each  new  observation  should  reveal  new,  useful  iiiformation,  and  the  best  way  to 
guarantee  the  maximum  effectiveness  of  each  observation  is  to  plan  them  on  the  basis  of  what  is  already  known. 

In  this  report,  we  presented  the  conceptual  basis  for  model-based  planning  of  observation  sequences  for  the  problem 
of  aspect  resolution,  which  is  equivalent  to  rough  localization.  We  first  derived  a  planning  methodology  for  the 
restricted  case  of  one  degree  of  fi«edom  in  object  pose  and  sensor  position,  then  extended  the  methodology  to  the 
more  general  case  of  three  degrees  of  freedom  each. 

The  planning  methodology  depended  upon  a  representation  of  possible  observations  as  a  function  of  both  sensor  posi¬ 
tion  and  object  pose  called  the  observation  graph.  We  defined  restrictions  of  the  observation  graph  to  be  subsets  that 
conespond  to  observations  of  particular  aspects  or  classes  by  the  sensor  in  a  base  position.  In  the  single  degree  of 
freedom  case,  the  observation  graph  is  a  planar  rectangular  region,  and  the  restrictions  are  strips  parallel  to  the  x-axis. 
Successive  observations  reduce  the  observation  graph  into  strips  parallel  to  the  x-axis.  Aspect  resolution  is  complete 
when  the  remaining  strips  are  a  subset  of  an  aspect  restriction. 

We  implemented  a  planner  for  the  restricted  case,  dealing  with  the  problem  of  aspect  resolution.  The  output  of  the 
planner  was  called  a  resolution  tree,  and  specified  sequences  of  sensor  positions  that  would  guarantee  resolution  of 
ambiguous,  a  congruent,  aspects.  Although  dependent  on  a  variety  of  heuristics  to  pnme  the  search  space,  the  plan¬ 
ner  was  capable  of  computing  resolution  trees  for  fairly  complicated  objects.  An  on-line  executive  was  written  to  uti¬ 
lize  resolution  trees  to  perform  aspect  resolution.  The  planner  was  tested  by  applying  it  to  objects  in  the  domain  of 
specular  objects,  and  in  the  finger  gap  sensing  domain.  The  computed  resolution  trees  were  tested  by  using  the  on-line 
executive  to  perform  aspect  resolution  on  both  real  and  simulat^  objects,  and  results  were  very  good. 

The  conclusion  to  be  drawn  from  the  research  reported  here  is  that  multiple  observations  can  be  effectively  employed 
to  resolve  ambiguous  scenes.  Moreover,  sequences  of  sensor  positions  can  be  planned  off-line  in  advance  for  known 
objects,  and  these  sequences  can  be  stored  and  executed  on-line  as  necessary.  While  our  implementation  and  experi¬ 
ments  dealt  with  a  constrained  domain  having  only  one  degree  of  freedom  in  each  of  object  pose  and  sensor  position, 
the  extension  to  three  degrees  of  freedom  in  each  was  derived  and  presented. 

Conceptually,  the  planning  problem  is  well-defined.  Each  sensor  observation  reduces  the  observation  graph  into 
strips,  each  of  which  can  be  further  reduced  by  additional  observations,  stopping  only  when  a  particular  su-ip  is  a  sub¬ 
set  of  an  aspect  restriction.  The  planning  process  is  representable  by  a  tree  smicture,  the  resolution  tree,  with  the  ini¬ 
tial  observation  at  the  root,  and  additional  observations  fanning  out  below  it.  In  principle,  the  entire  resolution  uce 
could  be  computed  and  stored  for  use  by  the  on-line  executive. 

In  practice,  however,  the  number  of  possible  moves  makes  the  planning  problem  quite  difficult.  The  concept  of  nodal 
points  in  the  observation  graph  reduced  the  number  of  moves  to  a  finite  set,  but  possibly  laige  set.  With  N  possible 
moves,  the  branching  factor  of  the  resolution  uee  is  N,  and  the  number  of  nodes  in  the  tree  is  0(N*').  where  d  is  the 
depth  of  the  tree.  Therefore,  rather  than  compute  the  entire  resolution  tree,  it  is  only  prtK;tical  to  compute  a  single 
resolving  subtree.  The  selection  of  the  subtree  is  necessarily  heuristic,  and  must  balance  off  factors  such  as  depth  of 
the  tree,  time  to  complete  search,  and  cost  of  sensor  motions.  At  this  time,  only  a  few  simple  search  heuristics  have 
been  tried,  with  the  main  emphasis  being  on  reducing  search  time. 

The  discussion  to  this  point  has  been  limited  to  the  task  of  object  localization.  In  fact,  the  same  methodology  can  be 
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easily  extended  to  the  task  of  object  idendficatioii  as  well.  Given  a  collection  of  objects,  aspects  can  be  identified  for 
each  object,  and  a  classification  tree  built  fix'  all  possible  aspects.  Then,  the  observation  grt^hs  for  each  object  can  be 
constructed.  Now,  a  given  observation  can  be  us^  to  restrict  each  of  the  observation  graphs,  and  object  identification 
is  complete  when  the  strips  remaining  after  a  sequence  of  observations  bebng  to  a  single  object 

As  seen  in  the  section  on  experiments  in  the  finger  gap  domain,  there  are  practical  tqjplications  for  the  multiple  obser¬ 
vation  methodology  even  in  the  restricted  case  of  a  senstx  with  a  single  degree  of  fre^m.  Slight  modifications  of  the 
problem  statement  make  many  more  igtplications  possible.  For  example,  one  typical  problem  is  that  of  determining 
the  best  positions  for  N  fixed  senstxs  to  perform  a  given  task.  Using  die  planning  methodology  outlined  in  this  report, 
this  problem  could  be  solve  by  expanding  the  resolution  tree  to  N  observations,  and  select  the  subtree  yielding  the 
minimum  expected  numba  of  ambiguous  cases. 

The  main  source  of  errors  when  utilizing  multiple  observations  is  not  planning  or  integration  of  observations,  but 
rather  the  processing  of  individual  observations.  All  of  the  errors  encountered  in  our  experiments  on  aspect  resolution 
were  found  to  be  caused  not  by  inadequate  or  incorrect  plans,  but  rather  by  failures  in  the  sensor  processing  modules 
(border  effects)  or  inadequacies  of  the  object  model  (border  effects  and  accidental  observations). 

Border  effects  are  the  term  given  to  errors  that  occur  near  the  border  between  observed  classes.  They  occur  because 
the  classification  procedure  is  forced  to  make  a  non-linear,  binary  decision  about  the  class  to  which  an  observation 
will  be  assigned.  If  the  object  model  is  inadequate,  the  boundary  is  poorly  defined,  and  errors  in  classification  are 
inevitable.  On  the  other  hand,  even  with  a  very  accurate  object  model,  errors  are  bound  to  occur  near  a  classification 
boundary  because  noise  will  affect  the  observations. 

Accidental  observations  are  observations  of  an  object  that  do  not  fit  into  any  krown  class.  Accidental  observations 
are  the  result  of  inadequate  object  models.  Ideally,  an  object  model  should  include  every  possible  object  appearance. 
For  complex  objects,  this  may  not  be  possible 

In  each  case,  the  {xoblem  is  due  to  the  fact  that  traditional  classifiers  make  a  non-linear  decision  for  each  observation. 
Thus,  in  the  case  of  border  effects,  noise  forces  the  result  to  be  on  one  side  or  the  other  of  the  boundary  between 
classes,  and  does  not  allow  for  possible  errors.  Similarly,  in  the  case  of  accidental  observations,  an  observation  that 
fits  no  known  class  is  forced  to  become  an  error. 

What  is  needed  is  a  more  robust  approach  to  classification.  With  multiple  observations,  it  becomes  possible  to  assign 
confidences  to  the  results  of  classification,  and  additional  observations  can  update  the  confidences.  This  will  modify 
the  planning  paradigm,  since  planning  must  be  performed  in  part  to  adjust  confidence  values;  that  is,  the  planner  must 
plan  to  acquire  evidence  to  verify  or  discount  various  hypotheses. 

Combining  multiple  observations  with  some  form  of  probabilistic  reasotung  t^ns  up  the  potential  for  verifiable 
object  recognition.  Rather  than  merely  asserting  the  correctness  of  a  given  hypothesis,  it  will  then  become  possible 
for  a  computer  vision  system  to  plan  and  execute  sequences  of  observations  designed  to  confirm  or  deny  the  hypoth¬ 
esis.  This  would  clearly  have  great  benefit  for  applications  that  require  highly  dependable  object  recognition. 

The  next  research  goal  is  to  implement  and  experiment  with  a  planner  and  on-line  executive  for  the  ihrcc-degrce  of 
freedom  problem.  The  implementation  is  expected  to  be  straightforward,  and  the  solution  of  the  more  general  prob¬ 
lem  is  expected  to  open  up  a  much  wider  variety  of  applications. 

Following  the  implementation  of  the  full  three  degree  of  freedom  planner,  planning  for  evidence  accumulation  will 
become  the  focus  of  this  research.  The  ultimate  goal  of  the  research  is  to  produce  a  highly  reliable  object  recognition 
system,  capable  of  performing  a  recognition  task  to  any  specified  level  of  confidence  by  making  multiple  observa¬ 
tions  and  accumulating  evidence. 
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